Knuth-Morris-Pratt Algorithm
Knuth-Morris-Pratt Algorithm
Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case.
The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of the next window. We take advantage of this information to avoid matching the characters that we know will anyway match.
Algorithm:
findPrefix(pattern, m, prefArray)
Begin length := 0 prefArray[0] := 0 for all character index ‘i’ of pattern, do if pattern[i] = pattern[length], then increase length by 1 prefArray[i] := length else if length ≠ 0 then length := prefArray[length - 1] decrease i by 1 else prefArray[i] := 0 done End
kmpAlgorithm(text, pattern)
Begin n := size of text m := size of pattern call findPrefix(pattern, m, prefArray) while i < n, do if text[i] = pattern[j], then increase i and j by 1 if j = m, then print the location (i-j) as there is the pattern j := prefArray[j-1] else if i < n AND pattern[j] ≠ text[i] then if j ≠ 0 then j := prefArray[j - 1] else increase i by 1 done End
Example:
#include<iostream> using namespace std; void findPrefix(string pattern, int m, int prefArray[]) { int length = 0; prefArray[0] = 0; //first place is always 0 as no prefix for(int i = 1; i<m; i++) { if(pattern[i] == pattern[length]) { length++; prefArray[i] = length; }else { if(length != 0) { length = prefArray[length - 1]; i--; //decrease i to avoid effect of increasing after iteration }else prefArray[i] = 0; } } } void kmpPattSearch(string mainString, string pattern, int *locArray, int &loc) { int n, m, i = 0, j = 0; n = mainString.size(); m = pattern.size(); int prefixArray[m]; //prefix array as same size of pattern findPrefix(pattern, m, prefixArray); loc = 0; while(i < n) { if(mainString[i] == pattern[j]) { i++; j++; } if(j == m) { locArray[loc] = i-j; //item found at i-j position. loc++; j = prefixArray[j-1]; //get the prefix length from array }else if(i < n && pattern[j] != mainString[i]) { if(j != 0) j = prefixArray[j-1]; else i++; } } } int main() { string str = "AAAABAAAAABBBAAAAB"; string patt = "AAAB"; int locationArray[str.size()]; int index; kmpPattSearch(str, patt, locationArray, index); for(int i = 0; i<index; i++) { cout << "Pattern found at location: " <<locationArray[i] << endl; } }
Comments
Post a Comment