增长缓冲区(c++,zlib)的累积压缩

Cumulative compression of a growing buffer (c++, zlib)

本文关键字:压缩 zlib 缓冲区 c++      更新时间:2023-10-16

我有一个随着时间的推移而增长的缓冲区(字符串),我需要通过输入大小有限(4096字节)的通道发送此缓冲区。通过此通道进行的通信成本很高,这就是为什么最好发送压缩数据的原因。缓冲区的增长由不同大小的块发生。这些块不能拆分,否则意义就会丢失。

我实际上在 c++ 中使用 zlib 进行压缩,并具有缓冲区大小限制。当达到此限制时,字符串将被压缩并发送到通道。这有效,但不是最佳的,因为限制相当低,不会丢失信息(通道输入限制为 4096 字节)。

我的想法是使用 zlib 构建一个具有不同大小压缩块的不断增长的压缩缓冲区,并在达到通道输入限制之前停止该过程。zlib 是否允许使用不同大小的压缩块,或者我需要另一种算法?

最简单的解决方案是将带外数据包描述转换为带内格式。到目前为止,最简单的方法是当您的输入块不使用所有 256 个可能的字节时。例如,当值 00 未出现在块中时,它可用于在压缩之前分隔块。否则,您将需要转义码。

无论哪种方式,您都可以使用块分隔符压缩连续流。在接收端,您解压缩流,识别分离器,然后重新组装块。

您可以简单地进行连续的 zlib 压缩,每次生成 4K 压缩数据时都会在您的频道上发送数据。另一方面,您需要确保解压缩程序以正确的顺序馈送 4K 压缩数据块。

zlib 中的 deflate 算法是突发的,在发出任何压缩数据之前,在内部积累了大约 16K 到 64K 或更多的数据,然后传递一个压缩数据块,然后再次累积。因此,除非您请求压缩刷新数据,否则会有延迟。如果您想减少延迟,您可以通过刷新来获得较小的块,对压缩有一些小的影响。

我成功地设计了一种压缩器,该压缩器通过输入大小有限的通道逐个部分发送不断增长的缓冲区。我在这里为任何从事相同问题的人提供了答案。感谢马克·阿德勒和引导我走上正确的道路。

class zStreamManager {
    public:
        zStreamManager();
        ~zStreamManager();
        void endStream();
        void addToStream(const void *inData, size_t inDataSize);
    private:
        // Size of base64 encoded is about 4*originalSize/3 + (3 to 6)
        // so with maximum output size of 4096, 3050 max zipped out
        // buffer will be fine 
        const size_t CHUNK_IN = 1024, CHUNK_OUT = 3050; 
        const std::string base64Chars = 
         "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
         "abcdefghijklmnopqrstuvwxyz"
         "0123456789+/";
        bool deallocated = true;
        z_stream stream;
        std::vector<uint8_t> outBuffer;
        std::string base64Encode(std::vector<uint8_t> &str);
};
zStreamManager::~zStreamManager() {
    endStream();
}
void zStreamManager::endStream() {
    if(!deallocated) {
        deallocated = true; 
        uint8_t tempBuffer[CHUNK_IN];
        int response = Z_OK;
        unsigned int have;
        while(response == Z_OK) {
            if (stream.avail_out == 0) {
                outBuffer.insert(outBuffer.end(), tempBuffer, tempBuffer + CHUNK_IN);
                stream.next_out = tempBuffer;
                stream.avail_out = CHUNK_IN;
            }
            response = deflate(&stream, Z_FINISH);
        }
        have = CHUNK_IN - stream.avail_out;
        if(have)
            outBuffer.insert(outBuffer.end(), tempBuffer, tempBuffer + have);
        deflateEnd(&stream);
        if(outBuffer.size())
            SEND << outBuffer << "$";
    }
}
void zStreamManager::addToStream(const void *inData, size_t inDataSize) {
    if(deallocated) {
        deallocated = false;
        stream.zalloc = 0;
        stream.zfree = 0;
        stream.opaque = 0;
        deflateInit(&stream, 9);
    }
    std::vector<uint8_t> tempBuffer(inDataSize);
    unsigned int have;
    stream.next_in = reinterpret_cast<uint8_t *>(const_cast<void*>(inData));
    stream.avail_in = inDataSize;
    stream.next_out = &tempBuffer[0];
    stream.avail_out = inDataSize;
    while (stream.avail_in != 0) {
        deflate(&stream, Z_SYNC_FLUSH);
        if (stream.avail_out == 0) {
            outBuffer.insert(outBuffer.end(), tempBuffer.begin(), tempBuffer.begin() + inDataSize);
            stream.next_out = &tempBuffer[0];
            stream.avail_out = inDataSize;
        }
    }
    have = inDataSize - stream.avail_out;
    if(have)
        outBuffer.insert(outBuffer.end(), tempBuffer.begin(), tempBuffer.begin() + have);
    while(outBuffer.size() >= CHUNK_OUT) {
        std::vector<uint8_t> zipped;
        zipped.insert(zipped.end(), outBuffer.begin(), outBuffer.begin() + CHUNK_OUT);
        outBuffer.erase(outBuffer.begin(), outBuffer.begin() + CHUNK_OUT);
        if(zipped.size())
           SEND << zipped << "|";
    }
}
std::string zStreamManager::base64Encode(std::vector<uint8_t> &str) {
    /* ALTERED VERSION OF René Nyffenegger BASE64 CODE
   Copyright (C) 2004-2008 René Nyffenegger
   This source code is provided 'as-is', without any express or implied
   warranty. In no event will the author be held liable for any damages
   arising from the use of this software.
   Permission is granted to anyone to use this software for any purpose,
   including commercial applications, and to alter it and redistribute it
   freely, subject to the following restrictions:
   1. The origin of this source code must not be misrepresented; you must not
      claim that you wrote the original source code. If you use this source code
      in a product, an acknowledgment in the product documentation would be
      appreciated but is not required.
   2. Altered source versions must be plainly marked as such, and must not be
      misrepresented as being the original source code.
   3. This notice may not be removed or altered from any source distribution.
   René Nyffenegger rene.nyffenegger@adp-gmbh.ch
    */
  unsigned char const* bytes_to_encode = &str[0];
  unsigned int in_len = str.size();
  std::string ret;
  int i = 0, j = 0;
  unsigned char char_array_3[3], char_array_4[4];
  while(in_len--) {
    char_array_3[i++] = *(bytes_to_encode++);
    if (i == 3) {
      char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
      char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
      char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
      char_array_4[3] = char_array_3[2] & 0x3f;
      for(i = 0; (i <4) ; i++)
        ret += base64Chars[char_array_4[i]];
      i = 0;
    }
  }
  if(i) {
    for(j = i; j < 3; j++)
      char_array_3[j] = '';
    char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
    char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
    char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
    char_array_4[3] = char_array_3[2] & 0x3f;
    for(j = 0; (j < i + 1); j++)
      ret += base64Chars[char_array_4[j]];
    while((i++ < 3))
      ret += '=';
  }
  return ret;
}

一个用例:

zStreamManager zm;
string growingBuffer = "";
bool somethingToSend = true;
while(somethingToSend) {
  RECEIVE(&growingBuffer);
  if(growingBuffer.size()) {
    zm.addToStream(growingBuffer.c_str(), growingBuffer.size());
    growingBuffer.clear();
  } else {
    somethingToSend = false;
  }
}
zm.endStream();

对于RECEIVESEND,用于接收缓冲区并通过通道发送它的方法。对于解压缩,每个部分都用"|"字符分隔,整个缓冲区的末尾用"$"分隔。每个部分都必须进行 base64 解码,然后连接。最后,它可以像任何其他压缩数据一样用 zlib 解压缩。