KMP算法和LPS表构造的运行时间

Runtime of KMP algorithm and LPS table construction

本文关键字：运行时间算法 LPS KMP 更新时间：2023-10-16

我最近遇到了KMP算法，我花了很多时间试图理解它为什么有效。虽然我现在确实理解了基本功能，但我只是不理解运行时计算。

我从geeksForGeeks网站上获取了以下代码：https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/

该网站声称，如果文本大小为O(n(，而模式大小为O。它还指出，LPS阵列可以在O(m(时间内计算。

// C++ program for implementation of KMP pattern searching 
// algorithm 
#include <bits/stdc++.h> 
void computeLPSArray(char* pat, int M, int* lps); 
// Prints occurrences of txt[] in pat[] 
void KMPSearch(char* pat, char* txt) 
{ 
int M = strlen(pat); 
int N = strlen(txt); 
// create lps[] that will hold the longest prefix suffix 
// values for pattern 
int lps[M]; 
// Preprocess the pattern (calculate lps[] array) 
computeLPSArray(pat, M, lps); 
int i = 0; // index for txt[] 
int j = 0; // index for pat[] 
while (i < N) { 
if (pat[j] == txt[i]) { 
j++; 
i++; 
} 
if (j == M) { 
printf("Found pattern at index %d ", i - j); 
j = lps[j - 1]; 
} 
// mismatch after j matches 
else if (i < N && pat[j] != txt[i]) { 
// Do not match lps[0..lps[j-1]] characters, 
// they will match anyway 
if (j != 0) 
j = lps[j - 1]; 
else
i = i + 1; 
} 
} 
}
// Fills lps[] for given patttern pat[0..M-1] 
void computeLPSArray(char* pat, int M, int* lps) 
{ 
// length of the previous longest prefix suffix 
int len = 0; 
lps[0] = 0; // lps[0] is always 0 
// the loop calculates lps[i] for i = 1 to M-1 
int i = 1; 
while (i < M) { 
if (pat[i] == pat[len]) { 
len++; 
lps[i] = len; 
i++; 
} 
else // (pat[i] != pat[len]) 
{ 
// This is tricky. Consider the example. 
// AAACAAAA and i = 7. The idea is similar 
// to search step. 
if (len != 0) { 
len = lps[len - 1]; 
// Also, note that we do not increment 
// i here 
} 
else // if (len == 0) 
{ 
lps[i] = 0; 
i++; 
} 
} 
} 
} 
// Driver program to test above function 
int main() 
{ 
char txt[] = "ABABDABACDABABCABAB"; 
char pat[] = "ABABCABAB"; 
KMPSearch(pat, txt); 
return 0; 
}

我真的很困惑为什么会这样。

对于LPS计算，考虑：aaaaa-caaac在这种情况下，当我们试图计算第一个c的LPS时，我们会一直返回，直到我们达到LPS[0]，即0并停止。因此，从本质上讲，我们将至少回到模式的长度，直到那一点。如果这种情况发生多次，时间复杂度将如何为O(m(？

我对KMP的运行时为O(n(有类似的困惑。

在发布之前，我已经阅读了堆栈溢出中的其他线程，以及关于该主题的各种其他网站

在构建LPS阵列的运行时建立上限的一种方法是考虑病理情况——我们如何最大限度地增加执行len=LPS的次数[len-1]？考虑以下字符串，忽略空格：x1 x2 x1x3 x1x2x1x4 x1x2x3x1x2x1x5。。。

第二项需要与第一项进行比较，就好像它以1而不是2结束一样，它将与第一项匹配。类似地，第三项需要与前两项进行比较，就好像它以1或2而不是3结束一样，它将与这些部分项相匹配。等等

在示例字符串中，很明显，只有每1/2^n个字符可以匹配n次，因此总运行时间将为m+m/2+m/4+=2m＝O(m(，图案串的长度。我怀疑构建一个运行时比示例字符串差的字符串是不可能的，这可能会被正式证明。