Hi Eric, Thank you for reviewing the patch. Please see responses to your questions inline below. On Thu, 2024-03-28 at 19:46 -0700, Eric Biggers wrote: > On Thu, Mar 28, 2024 at 10:44:41AM -0700, Andre Glover wrote: > > The 'canned' compression mode implements a compression scheme that > > uses a statically defined set of Huffman tables, but where the > > Deflate > > Block Header is implied rather than stored with the compressed > > data. > > This already exists in standard DEFLATE; it's called fixed mode. See > section > 3.2.6 of RFC1951 > (https://datatracker.ietf.org/doc/html/rfc1951#page-12). > > I think that what's going on is that you've implemented a custom > variant of > DEFLATE where you set the fixed Huffman codes to something different > from the > ones defined in the standard. > > Is that correct, or are there other differences? > We view it as a variant of dynamic block Deflate as opposed to a variant of fixed. In particular, it is compressing the input with static Huffman tables (i.e. ones that do not vary with the input), and where the Deflate block header (which is a constant) is not stored with the compressed data. If the missing block header were to be prepended to the compressed data, the result would be a valid dynamic compressed block. One might think of this as vaguely similar to dictionary compression. In that case, the dictionary is not stored with the compressed data but is agreed to by the compressor and decompression. In the case of canned compression, the header is not stored with the compressed data but is agreed to by both entities. > Actually, looking at your zlib_tr_flush_block(), it looks instead of > using the > reserved block type value (3) or redefining the meaning of the fixed > block type > value (1), you actually deleted the BTYPE and BFINAL fields from the > data stream > entirely. So the stream no longer stores the type of block or the > flag that > indicates whether the block is the final one or not. > > That has the property that there cannot be any standard blocks, even > uncompressed blocks, included in the data stream anymore. Is that > intentional? > Conceptually, there is a valid dynamic block header associated with the compressed data, it is just not stored with the data in order to save space (since it is effectively an unchanging constant). In this usage, it is envisioned that the output would consist of a single Deflate block (i.e. the implied header is marked as BFINAL). In theory, one could have the implied block header not marked as BFINAL, and so the compressed data would need to contain at least two blocks (i.e. the body of the initial block, an EOB, and a normal block with header), but this would be counterproductive to the intended usage. > Maybe this is why you're using the name "canned", instead of going > with > something more consistent with the existing "fixed" name, like > "custom-fixed"? > Again, we view this as a variant of dynamic compression rather than as a variant of fixed compression. We use the term "static" to describe compression with a dynamic block where the tables are unchanging (i.e. not dependent on the input data) . This can allow the compression to be done in one pass rather than two. The "canned" compression uses a static table, but is different in that the header is implied rather than stored. > I wonder what the plan is for when the next hardware vendor tries to > do this and > chooses their own Huffman codes, different from yours. Or what if > Intel decides > the Huffman codes they chose aren't the best ones anymore and > releases new > hardware that uses different codes. Will we perhaps be getting a > tinned mode > too? > The Huffman tables are not built into the hardware. The Huffman tables/block header is in this case built into the software. The same tables are embedded in the zlib portion of the patch for software compression/decompression and in the hardware driver portion of the patch for IAA compression/decompression. The hardware itself can work with any desired set of tables. Thanks, Andre