Reviewer: Daniel Migault Review result: Has Nits Hi, I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the security area directors. Document editors and WG chairs should treat these comments just like any other last call comments. The summary of the review is: Has Nits Please find my comments below Yours , Daniel [...] 3.1.1. Zstandard Frames The structure of a single Zstandard frame is as follows: +--------------------+------------+ | Magic_Number | 4 bytes | +--------------------+------------+ | Frame_Header | 2-14 bytes | +--------------------+------------+ | Data_Block | n bytes | +--------------------+------------+ | [More Data_Blocks] | | +--------------------+------------+ | [Content_Checksum] | 0-4 bytes | +--------------------+------------+ Magic_Number: 4 bytes, little-endian format. Value: 0xFD2FB528. Frame_Header: 2 to 14 bytes, detailed in Section 3.1.1.1. Data_Block: Detailed in Section 3.1.1.2. This is where data appears. <mglt>Given that Content Check sum is mentioned as optional, I believe that its size can be indicated to 4 bytes. One additional nit, the Frame Header does not take any value between 2 and 14 but actually a subset of such values. I am not sure these values should be listed instead. </mglt> [...] 3.1.1.1.1.1. Frame_Content_Size_Flag This is a 2-bit flag (equivalent to Frame_Header_Descriptor right- shifted 6 bits) specifying whether Frame_Content_Size (the decompressed data size) is provided within the header. Flag_Value provides FCS_Field_Size, which is the number of bytes used by Frame_Content_Size according to the following table: +----------------+--------+---+---+---+ | Flag_Value | 0 | 1 | 2 | 3 | +----------------+--------+---+---+---+ | FCS_Field_Size | 0 or 1 | 2 | 4 | 8 | +----------------+--------+---+---+---+ When Flag_Value is 0, FCS_Field_Size depends on Single_Segment_Flag: If Single_Segment_Flag is set, FCS_Field_Size is 1. Otherwise, FCS_Field_Size is 0; Frame_Content_Size is not provided. <mglt>Purely editorial, but it seems to me clearer not having to designations for the same entity, so maybe replacing Flag by Frame_Content_Size_Flag might be easier to read. In addition, Flag_Value looks like another variable and might represented as Frame_Content_Size_Flag value instead. Similarly, it might be clearer replacing FCS_Field_Size by Frame_Content_Size field's length. </mglt> 3.1.1.1.1.2. Single_Segment_Flag If this flag is set, data must be regenerated within a single continuous memory segment. In this case, Window_Descriptor byte is skipped, but Frame_Content_Size is necessarily present. As a consequence, the decoder must allocate a memory segment of size equal or larger than Frame_Content_Size. <mglt>I understand this information as being interpreted by the decompressor to proceed to the decompression. More specifically, the decompressed files indicates that such amount of memory will be needed to proceed to the decompression. I assume that at least the decompressed frame is expected to fit into value indicated by the Frame_Content_Size. I believe it might be useful to clearly indicate that mechanism needs to be provided to check while processing the file, the process does not go beyond the allocated memory, and if so an error needs to be returned and the decompression aborted. The main difference with the considerations mentioned below is that the check is not provided at run time. This is somewhat mentioned in the security consideration, but it might be beneficial to have this at specific places in the document.</mglt> [...] 3.1.1.1.1.3. Unused Bit A decoder compliant with this specification version shall not interpret this bit. It might be used in a future version, to signal a property that is not mandatory to properly decode the frame. An encoder compliant with this specification must set this bit to zero. <mglt>Probably normative language may be needed here (MUST/MUST NOT). SHOULD be zero for the decompressor and MUST be set to zero by the compressor. </mglt> 3.1.1.1.1.4. Reserved Bit This bit is reserved for some future feature. Its value must be zero. A decoder compliant with this specification version must ensure it is not set. This bit may be used in a future revision, to signal a feature that must be interpreted to decode the frame correctly. <mglt>Normative language may be appropriated as well.</mglt> [...] 3.1.1.1.1.6. Dictionary_ID_Flag This is a 2-bit flag (= Frame_Header_Descriptor & 0x3) indicating whether a dictionary ID is provided within the header. It also specifies the size of this field as DID_Field_Size: +----------------+---+---+---+---+ | Flag_Value | 0 | 1 | 2 | 3 | +----------------+---+---+---+---+ | DID_Field_Size | 0 | 1 | 2 | 4 | +----------------+---+---+---+---+ <mglt>same comments regarding _Value and FIeld_Size></mglt> 3.1.1.1.2. Window Descriptor This provides guarantees about the minimum memory buffer required to decompress a frame. This information is important for decoders to allocate enough memory. <mglt>Same comment regarding the ability to set boundaries to the memory allocating. I have the impression that memory allocation can serve two purposes: storing the Block as well as some context associated to the processing. I am wondering if that is correct or not. I suspect that if there is a relation with the Frame Size, it might be restricted to the decompressed data. </mglt> The Window_Descriptor byte is optional. When Single_Segment_Flag is set, Window_Descriptor is not present. In this case, Window_Size is Collet & Kucherawy Expires June 22, 2020 [Page 9] Internet-Draft application/zstd December 2019 Frame_Content_Size, which can be any value from 0 to 2^64-1 bytes (16 ExaBytes). <mglt>I have the impression that some checks on Windows Size with the Frame SIze can be performed. If that is correct, I believe these should be mentioned. It seems that in any cases Window_Size <= Frame_Size. </mglt> +------------+----------+----------+ | Bit Number | 7-3 | 2-0 | +------------+----------+----------+ | Field Name | Exponent | Mantissa | +------------+----------+----------+ The minimum memory buffer size is called Window_Size. It is described by the following formulae: windowLog = 10 + Exponent; windowBase = 1 << windowLog; windowAdd = (windowBase / 8) * Mantissa; Window_Size = windowBase + windowAdd; The minimum Window_Size is 1 KB. The maximum Window_Size is (1<<41) + 7*(1<<38) bytes, which is 3.75 TB. In general, larger Window_Size values tend to improve the compression ratio, but at the cost of increased memory usage. To properly decode compressed data, a decoder will need to allocate a buffer of at least Window_Size bytes. In order to protect decoders from unreasonable memory requirements, a decoder is allowed to reject a compressed frame that requests a memory size beyond decoder's authorized range. For improved interoperability, it's recommended for decoders to support values of Window_Size up to 8 MB and for encoders not to generate frames requiring a Window_Size larger than 8 MB. It's merely a recommendation though, and decoders are free to support larger or lower limits, depending on local limitations. <mglt>I think the recommendation is to not use Window_Size larger than 8MB, which will leave minimal implementation to have a fixed Windows Size of 8MB. This also means that Windows_Size mechanism is not supported. I would be a bit reluctant to discourage implementing this mechanism as those implementation may stay forever, and make any use of larger Windows_Size than 8MB. I am wondering if the limitation should not result from deployed hardware. In which case we could write this recommendation as to limit the Windows Size to 8MB.</mglt> 3.1.1.1.3. Dictionary_ID This is a variable size field, which contains the ID of the dictionary required to properly decode the frame. This field is optional. When it's not present, it's up to the decoder to know which dictionary to use. <mglt>I understand the document does not define Dictionary_ID. It might be worth clarifying this here. </mglt> Dictionary_ID field size is provided by DID_Field_Size. DID_Field_Size is directly derived from the value of Dictionary_ID_Flag. One byte can represent an ID 0-255; 2 bytes can represent an ID 0-65535; 4 bytes can represent an ID 0-4294967295. Format is little-endian. Collet & Kucherawy Expires June 22, 2020 [Page 10] Internet-Draft application/zstd December 2019 It is permitted to represent a small ID (for example, 13) with a large 4-byte dictionary ID, even if it is less efficient. Within private environments, any dictionary ID can be used. However, for frames and dictionaries distributed in public space, Dictionary_ID must be attributed carefully. <mglt>I think that "carefully" needs to be specified a bit more. Though I suppose this is defined in the IANA section. I also suppose that low and high ranges are allocated to specific categories. If these ranges are introduced, it might worth to specify the purpose of each range. The ranges (high, low) do not map the possibles sizes (unspecified, 0-255, 256-2**16-1, 2**16-2**32-1). Maybe ths could be clarified as well. </mglt> The following ranges are reserved for use only with dictionaries that have been registered with IANA (see Section 7.4): low range: <= 32767 high range: >= (1 << 31) Any other value for Dictionary_ID can be used by private arrangement between participants. Any payload presented for decompression that references an unregistered reserved dictionary ID results in an error. 3.1.1.1.4. Frame Content Size This is the original (uncompressed) size. This information is optional. Frame_Content_Size uses a variable number of bytes, provided by FCS_Field_Size. FCS_Field_Size is provided by the value of Frame_Content_Size_Flag. FCS_Field_Size can be equal to 0 (not present), 1, 2, 4, or 8 bytes. +----------------+--------------+ | FCS Field Size | Range | +----------------+--------------+ | 0 | unknown | +----------------+--------------+ | 1 | 0 - 255 | +----------------+--------------+ | 2 | 256 - 65791 | +----------------+--------------+ | 4 | 0 - 2^32 - 1 | +----------------+--------------+ | 8 | 0 - 2^64 - 1 | +----------------+--------------+ Frame_Content_Size format is little-endian. When FCS_Field_Size is 1, 4, or 8 bytes, the value is read directly. When FCS_Field_Size is 2, the offset of 256 is added. It's allowed to represent a small size (for example 18) using any compatible variant. <mglt>Probably some checks needs to be enforced and decompressor behavior needs to be specified. </mglt> [...] 3.1.1.2.3. Block_Size The upper 21 bits of Block_Header represent the Block_Size. When Block_Type is Compressed_Block or Raw_Block, Block_Size is the size of Block_Content (hence excluding Block_Header). When Block_Type is RLE_Block, since Block_Content's size is always 1, Block_Size represents the number of times this byte must be repeated. Block_Size is limited by Block_Maximum_Size (see below). <mglt>I suspect that Block_Size may be verified against the Frame Content Size ?</mglt> 3.1.1.2.4. Block_Content and Block_Maximum_Size The size of Block_Content is limited by Block_Maximum_Size, which is the smallest of: o Window_Size o 128 KB <mglt>Given that it was recommended to support windows up to 8MB, I am wondering why blocks cannot be larger. This is probably ignorance and I apology for the question. </mglt> Block_Maximum_Size is constant for a given frame. This maximum is applicable to both the decompressed size and the compressed size of any block in the frame. The reasoning for this limit is that a decoder can read this information at the beginning of a frame and use it to allocate buffers. The guarantees on the size of blocks ensure that the buffers will be large enough for any following block of the valid frame. 3.1.1.3. Compressed Blocks To decompress a compressed block, the compressed size must be provided from the Block_Size field within Block_Header. Collet & Kucherawy Expires June 22, 2020 [Page 13] Internet-Draft application/zstd December 2019 A compressed block consists of two sections: a Literals Section (Section 3.1.1.3.1) and a Sequences_Section (Section 3.1.1.3.2). The results of the two sections are then combined to produce the decompressed data in Sequence Execution (Section 3.1.1.4). To decode a compressed block, the following elements are necessary: o Previous decoded data, up to a distance of Window_Size, or the beginning of the Frame, whichever is smaller. Single_Segment_Flag will be set in the latter case. o List of "recent offsets" from the previous Compressed_Block. <mglt>This is a general comment when execution is based on computed length, size, or involve offsets. I also susect that reading backward might result in a few additional risks. It would be good to specify the boundaries associated to these values to avoid buffer overflow. If such check ar not possible, it would be good also to raise the concern that the implementation needs to carefully handle that operation. Now that have been through the Security Consideration Section, I can see this has been done for one variable, but I am not sure others variable defined in the doc are not subject to the same type of attacks.</mglt> o The previous Huffman tree, required by Treeless_Literals_Block type. o Previous Finite State Entropy (FSE) decoding tables, required by Repeat_Mode, for each symbol type (literals lengths, match lengths, offsets). Note that decoding tables are not always from the previous Compressed_Block: o Every decoding table can come from a dictionary. o The Huffman tree comes from the previous Compressed_Literals_Block. <mglt>My knowledge in compression are too weak to comment, but my understanding is that decompression relies on decoding tables. Which means that they can influence the processing either via the dictionary, a specifically crafted compressed block,.... I am wondering how these tables can influence the execution: * does different tables end in different usage of resources? I think no. If the table ends in different outputs, this might be used to compress Content1 and decompress it in Content 2. ? If that is possible, I believe the security consideration should recommend that Content signature MUST always be provided on the decompressed content, and not the compressed content. The same considerations are of course valid for anything that influence the compression/decompression. </mglt> [...] 3.1.1.3.1.6. Jump_Table The Jump_Table is only present when there are 4 Huffman-coded streams. (Reminder: Huffman-compressed data consists of either 1 or 4 Huffman- coded streams.) If only 1 stream is present, it is a single bitstream occupying the entire remaining portion of the literals block, encoded as described within Section 4.2.2. If there are 4 streams, Literals_Section_Header only provides enough information to know the decompressed and compressed sizes of all 4 streams combined. The decompressed size of each stream is equal to (Regenerated_Size+3)/4, except for the last stream, which may be up to 3 bytes smaller, to reach a total decompressed size as specified in Regenerated_Size. The compressed size of each stream is provided explicitly in the Jump_Table. The Jump_Table is 6 bytes long and consists of three 2-byte little-endian fields, describing the compressed sizes of the first 3 streams. Stream4_Size is computed from Total_Streams_Size minus sizes of other streams. Stream4_Size = Total_Streams_Size - 6 - Stream1_Size - Stream2_Size - Stream3_Size Note that if Stream1_Size + Stream2_Size + Stream3_Size exceeds Total_Streams_Size, the data are considered corrupted. <mglt>I would maybe expect that implementation MUST check this.</mglt> Each of these 4 bitstreams is then decoded independently as a Huffman-Coded stream, as described in Section 4.2.2. [...] 4.1.1. FSE Table Description To decode FSE streams, it is necessary to construct the decoding table. The Zstandard format encodes FSE table descriptions as described here. An FSE distribution table describes the probabilities of all symbols from 0 to the last present one (included) on a normalized scale of (1 << Accuracy_Log). Note that there must be two or more symbols with non-zero probability. A bitstream is read forward, in little-endian fashion. It is not necessary to know its exact size, since the size will be discovered and reported by the decoding process. The bitstream starts by reporting on which scale it operates. If low4bits designates the lowest 4 bits of the first byte, then Accuracy_Log = low4bits + 5. <mglt>This looks like processing until the end to determine the size. If that is correct, it seems dangerous unless some measurements are take to prevent resource exhaustion. </mglt> [...] 4.2. Huffman Coding Zstandard Huffman-coded streams are read backwards, similar to the FSE bitstreams. Therefore, to find the start of the bitstream, it is necessary to know the offset of the last byte of the Huffman-coded stream. After writing the last bit containing information, the compressor writes a single 1 bit and then fills the byte with 0-7 0 bits of padding. The last byte of the compressed bitstream cannot be 0 for that reason. <mglt> fills the byte with 0-7 0 bits of... I have hard time to parse the sentence. I understand it as fills the byte by setting bits 0-7 to 0.</mglt> [...] 7. IANA Considerations <mglt>Given that the documents defines Dictionnaries, I am wondering if a registry shoudl not be created. </mglt> [...] 7.4. Dictionaries Work in progress includes development of dictionaries that will optimize compression and decompression of particular types of data. Specification of such dictionaries for public use will necessitate registration of a code point from the reserved range described in Section 3.1.1.1.3 and its association with a specific dictionary. However, there are at present no such dictionaries published for public use, so this document makes no immediate request of IANA to create such a registry. <mglt>Dictionnaries will probably be defined later through a standard process, I believe that is worth being mentioned. In addition, we may also insist on specifying the impact on resource as well as the resulting output need to be closely looked at. The problem being essentially if decompres(f, dict1) and decompress(f, dict2) provides different outputs and this property can be used by an attacker to transform the uncompressed file. </mglt> 8. Security Considerations Any data compression method involves the reduction of redundancy in the data. Zstandard is no exception, and the usual precautions apply. One should never compress a message whose content must remain secret with a message generated by a third party. Such a compression can be used to guess the content of the secret message through analysis of entropy reduction. This was demonstrated in the Compression Ratio Info-leak Made Easy (CRIME) attack [CRIME], for example. <mglt>When uncompressed data needs to be protected, it might be worth clarifying the differences associated to protecting (authenticating) the compressed data as opposed to the uncompressed data. I suspect if might be recommended to authenticate uncompressed data. </mglt> A decoder has to demonstrate capabilities to detect and prevent any kind of data tampering in the compressed frame from triggering system faults, such as reading or writing beyond allowed memory ranges. This can be guaranteed by either the implementation language or careful bound checkings. Of particular note is the encoding of Number_of_Sequences values that cause the decoder to read into the block header (and beyond), as well as the indication of a Frame_Content_Size that is smaller than the actual decompressed data, in an attempt to trigger a buffer overflow. It is highly recommended to fuzz-test (i.e., provide invalid, unexpected, or random input and verify safe operation of) decoder implementations to test and harden Collet & Kucherawy Expires June 22, 2020 [Page 42] Internet-Draft application/zstd December 2019 their capability to detect bad frames and deal with them without any adverse system side effect. An attacker may provide correctly formed compressed frames with unreasonable memory requirements. A decoder must always control memory requirements and enforce some (system-specific) limits in order to protect memory usage from such scenarios. Compression can be optimized by training a dictionary on a variety of related content payloads. This dictionary must then be available at the decoder for decompression of the payload to be possible. While this document does not specify how to acquire a dictionary for a given compressed payload, it is worth noting that third-party dictionaries may interact unexpectedly with a decoder, leading to possible memory or other resource exhaustion attacks. We expect such topics to be discussed in further detail in the Security Considerations section of a forthcoming RFC for dictionary acquisition and transmission, but highlight this issue now out of an abundance of caution. As discussed in Section 3.1.2, it is possible to store arbitrary user metadata in skippable frames. While such frames are ignored during decompression of the data, they can be used as a watermark to track the path of the compressed payload. -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call