在C++程序中输入的文本文件将不起作用，除非文本被复制和粘贴

Text file as input in C++ program will not work unless the text is copy and pasted

本文关键字：文本复制程序 C++ 输入文件不起作用更新时间：2023-10-16

我的代码中有一个非常奇怪的错误，有点难以解释。让我从程序的作用开始：基本上，C++程序获取输入文本(来自同一目录中名为"input.txt"的文件)，并使用马尔可夫链生成一些类似于输入文本样式的人工输出文本，并将其打印到终端。

当我复制并粘贴《爱丽丝梦游仙境》的文本时，它就起作用了(http://paulo-jorente.de/text/alice_oz.txt)直接输入到"input.txt"中，但如果我在文本文件内容的开头或结尾添加任何单词或字符，则代码将停止运行(或无限运行)。但是，如果在文本文件内容中间的任何位置添加文本，则不会发生这种情况。

如果你想自己测试，试着用复制到"input.txt"中的爱丽丝梦游仙境运行代码。然后在成功运行后，转到input.txt，在"爱丽丝"的最后一段文本后键入一些随机字符或单词("…又回家了！")，然后试着再次运行；它将失败。

这是代码：

#include <ctime>
#include <iostream>
#include <algorithm>
#include <fstream>
#include <string>
#include <vector>
#include <map>
using namespace std;
class markovTweet{
string fileText;
map<string, vector<string> > dictionary;
public:
void create(unsigned int keyLength, unsigned int words) {
ifstream f("input.txt");
if(f.good()){
fileText.assign((istreambuf_iterator<char>(f)), istreambuf_iterator<char>());
}else{
cout << "File cannot be read. Ensure there is a file called input.txt in this directory." << "n" << endl;
return;
}
if(fileText.length() < 1){
return;
}
cout << "n" << "file imported" << "n";
createDictionary(keyLength);
cout << "n" << "createDictionary" << "n" << "n";
createText(words - keyLength);
cout << "n" << "text created, done" << endl;
}
private:
void createText(int w) {
string key, first, second;
size_t next;
map<string, vector<string> >::iterator it = dictionary.begin();
advance( it, rand() % dictionary.size() );
key = (*it).first;
cout << key;
while(true) {
vector<string> d = dictionary[key];
if(d.size() < 1) break;
second = d[rand() % d.size()];
if(second.length() < 1) break;
cout << " " << second;
if(--w < 0) break;
next = key.find_first_of( 32, 0 );
first = key.substr( next + 1 );
key = first + " " + second;
}
cout << "n";
}
void createDictionary(unsigned int kl) {
string w1, key;
size_t wc = 0, pos, next;
next = fileText.find_first_not_of( 32, 0 );
if(next == string::npos) return;
while(wc < kl) {
pos = fileText.find_first_of(' ', next);
w1 = fileText.substr(next, pos - next);
key += w1 + " ";
next = fileText.find_first_not_of(32, pos + 1);
if(next == string::npos) return;
wc++;
}
key = key.substr(0, key.size() - 1);
while(true) {
next = fileText.find_first_not_of(32, pos + 1);
if(next == string::npos) return;
pos = fileText.find_first_of(32, next);
w1 = fileText.substr(next, pos - next);
if(w1.size() < 1) break;
if(find( dictionary[key].begin(), dictionary[key].end(), w1) == dictionary[key].end() ) 
dictionary[key].push_back(w1);
key = key.substr(key.find_first_of(32) + 1) + " " + w1;
}
}
};
int main() {  
markovTweet t;
cout << "n" << "Artificially generated tweet using Markov Chains based off of input.txt: " << "n" << "n";
//lower first number is more random sounding text, second number is how long output is.
t.create(4, 30);
return 0;
}

这是一个非常奇怪的错误，非常感谢您能提供的任何帮助！谢谢

这可能是关于std::map的operator[]()的时间复杂性的一些思考。

使用运算符[]："[]"也可以用于在映射中插入元素。类似于上面的函数，并返回指向新构建的元素的指针。不同之处在于，该运算符总是构造一个新元素，即即使值没有映射到键，也会调用默认构造函数，并为键分配一个"null"或"empty"值。贴图的大小总是增加1。时间复杂性：log(n)，其中n是映射的大小
由：极客对极客提供

在类的createDictionary()函数中，尝试在2^ndwhile循环中添加这行代码：

{
//...code 
if (find(dictionary[key].begin(), dictionary[key].end(), w1) == dictionary[key].end()) {
dictionary[key].push_back(w1);
std::cout << dictionary.size() << std::endl;
//code...
}

当我从文件中复制文本时，它在你的字典或哈希图中生成了62037个条目。跑完全程大约需要20-30秒。

当我将文本"Good Bye！"添加到文件末尾，保存并运行程序/调试器时，它生成了62039个条目。同样，它花了大约20-30秒的时间运行。

然后，我将文本"Hello World"添加到文件的开头，保存并运行程序/调试器，它生成了62041个条目。同样，它花了大约20-30秒的时间运行。

然而，在这个过程中有几次，它在地图中生成了那么多条目，但代码仍在循环中。。。有一次是在620xx到640xx之间。我不知道是什么原因导致它生成这么多密钥。。。但正如我所说，有几次它退出了打印值，但仍在迭代相同的while循环，但地图的大小并没有增加。。。

这种情况发生在我第一次在文件开头输入文本时，我在文件末尾添加了文本。就在这时，我决定打印出你地图的大小，并注意到我得到了这个无限循环。。。然后我停止了调试器，返回到文本文件，并将插入的文本保留在开头，但删除了末尾附加的文本，确保在文本末尾留下一个空格。

这一次，当我运行程序/调试器时，它工作正常，生成了62039个条目。同样，它花了大约20-30秒的时间运行。之后，第一次成功地在开头插入文本是在末尾添加文本，并且运行良好。然后，我甚至尝试在文本文件中输入"Hello World！"，然后在"Good Bye！"前面加一行换行符，但效果仍然很好。

是的，有一些东西导致了一个错误，但我不知道到底是什么导致了它。然而，我相信我已经追踪到它在这个while循环和退出的条件分支中。。。它本应该脱离这个循环，进入createText函数，但它从未爆发，条件是：

if (next == std::string::npos) return

和

if (w1.size() < 1) break;

不知怎么的，他们没有被满足。

时间复杂度还可以，但它不是最好的，但也不是最差的，因为O(log n)时间中大约有62-63k个条目在运行。这还不包括计算需要考虑的空间复杂性。

可能是在一次运行过程中，您可能会遇到堆栈溢出，从而导致无限循环，而下次运行时可能不会。我认为这与直接在文本文件中添加文本无关，只是它会增加O(log N) time中地图的大小，并增加空间复杂性。

不管你在这个文本文件中添加了什么，保存后，无论你的程序或算法是如何编写的，它都会通过迭代器类按字符类型将该文件的所有内容作为指针索引，并将其存储到一个字符串fileText中。构造完这个字符串后，类的成员字符串中大约有336940个字符。

希望这些信息能指导你缩小程序中错误的位置，并确定真正导致错误的原因。确实很难缩小这个罪魁祸首的范围。