IETF LC comments on draft-ietf-httpbis-header-compression-10

Julian Reschke <julian.reschke@xxxxxxxxxxxxx> · Sat, 10 Jan 2015 17:58:46 +0100

Hi there,

these comments are mostly editorial; the main exception being the 
contents of the predefined static table.

Best regards, Julian

1.  Introduction

   In HTTP/1.1 (see [RFC7230]), header fields are not compressed.  As
   Web pages have grown to include dozens to hundreds of requests, the

s/include/cause/ (the page doesn't really include requests)

1.1.  Overview

   The format defined in this specification treats a list of header
   fields as an ordered collection of name-value pairs that can include
   duplicates.  Names and values are considered to be opaque sequences
   of octets, and the order of header fields is preserved after being
   compressed and decompressed.

When we say duplicate here, it's about duplicate *names*, right?

2.1.  Header List Ordering

   HPACK preserves the ordering of header fields inside the header list.
   An encoder MUST order header field representations in the header
   block according to their ordering in the original header list.  A
   decoder MUST order header fields in the decoded header list according
   to their ordering in the header block.

Just before we said that we do not define the encoder. Maybe reprase the 
first MUST as a statement of fact.

2.3.2.  Dynamic Table

   The dynamic table can contain duplicate entries.  Therefore,
   duplicate entries MUST NOT be treated as an error by a decoder.

What's the definition of "duplicate" here? Name and value match?

3.2.  Header Field Representation Processing

   The processing of a header block to obtain a header list is defined
   in this section.  To ensure that the decoding will successfully
   produce a header list, a decoder MUST obey the following rules.

"This section defines..."

4.1.  Calculating Table Size

   The size of the dynamic table is the sum of the size of its entries.

   The size of an entry is the sum of its name's length in octets (as
   defined in Section 5.2), its value's length in octets (see
   Section 5.2), plus 32.

Remove one of the references to Section 5.2.

   The size of an entry is calculated using the length of the name and
   value without any Huffman encoding applied.

This is the first time that Huffman is mentioned; it would probably be 
good to mention it earlier on.

   NOTE: The additional 32 octets account for the overhead associated
   with an entry.  For example, an entry structure using two 64-bit
   pointers to reference the name and the value of the entry, and two
   64-bit integers for counting the number of references to the name and
   value would have 32 octets of overhead.

maybe say "estimated overhead"?

5.1.  Integer Representation

   Integers are used to represent name indexes, pair indexes or string
   lengths.  To allow for optimized processing, an integer
   representation always finishes at the end of an octet.

First use of "pair index"; at this point it's not clear what that might 
be.

   Pseudo-code to represent an integer I is as follows:

   if I < 2^N - 1, encode I on N bits
   else
       encode (2^N - 1) on N bits
       I = I - (2^N - 1)
       while I >= 128
            encode (I % 128 + 128) on 8 bits
            I = I / 128
       encode I on 8 bits

   Pseudo-code to decode an integer I is as follows:

   decode I from the next N bits
   if I < 2^N - 1, return I
   else
       M = 0
       repeat
           B = next octet
           I = I + (B & 127) * 2^M
           M = M + 7
       while B & 128 == 128
       return I

We have bad experience with making indentation in artwork significant; 
maybe introducing brackets would be a good idea.

   This integer representation allows for values of indefinite size.  It
   is also possible for an encoder to send a large number of zero
   values, which can waste octets and could be used to overflow integer
   values.  Excessively large integer encodings - in value or octet
   length - MUST be treated as a decoding error.  Different limits can
   be set for each of the different uses of integers, based on
   implementation constraints.

Having a MUST here when we don't say what "excessive" is seems like a 
bad idea.

5.2.  String Literal Representation

   Header field names and header field values can be represented as
   literal string.  A literal string is encoded as a sequence of octets,
   either by directly encoding the literal string's octets, or by using
   a Huffman code (see [HUFFMAN]).

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | H |    String Length (7+)     |
   +---+---------------------------+
   |  String Data (Length octets)  |
   +-------------------------------+

                  Figure 4: String Literal Representation

   A literal string representation contains the following fields:

   H: A one bit flag, H, indicating whether or not the octets of the
      string are Huffman encoded.

   String Length:  The number of octets used to encode the string
      literal, encoded as an integer with 7-bit prefix (see
      Section 5.1).

   String Data:  The encoded data of the string literal.  If H is '0',
      then the encoded data is the raw octets of the string literal.  If
      H is '1', then the encoded data is the Huffman encoding of the
      string literal.

The figure could be interpreted as if the string length is always 
expressed in 7 bits, which I believe is not true.

6.2.1.  Literal Header Field with Incremental Indexing

   A literal header field with incremental indexing representation
   results in appending a header field to the decoded header list and
   inserting it as a new entry into the dynamic table.

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 |      Index (6+)       |
   +---+---+-----------------------+
   | H |     Value Length (7+)     |
   +---+---------------------------+
   | Value String (Length octets)  |
   +-------------------------------+

    Figure 6: Literal Header Field with Incremental Indexing - Indexed
                                   Name

See above. One can read this as "H" being bit 0 in the second octet; but 
that's not always the case. I think it would be good to revise the 
figure format.

6.3.  Dynamic Table Size Update

   The new maximum size MUST be lower than or equal to the last value of
   the SETTINGS_HEADER_TABLE_SIZE parameter (see Section 6.5.2 of
   [HTTP2]) received from the decoder and acknowledged by the encoder
   (see Section 6.5.3 of [HTTP2]).

Some parts of the spec read as if HPACK doesn't have direct dependency 
on HTTP/2 or HTTP in general. However, here a normative dependency is 
introduced.

7.1.  Probing Dynamic Table State

   This is possible even over TLS, because while TLS provides
   confidentiality protection for content, it only provides a limited
   amount of protection for the length of that content.

Expand "TLS" on first use, and add a reference.

9.  References

9.1.  Normative References

   [HTTP2]      Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
                Transfer Protocol version 2",
                draft-ietf-httpbis-http2-16 (work in progress),
                October 2014.

November (already fixed in Git).

9.2.  Informative References

   [CRIME]      Rizzo, J. and T. Duong, "The CRIME Attack",
                September 2012, <https://docs.google.com/a/twist.com/
                presentation/d/
                11eBmGiHbYcHR9gL5nDyZChu_-lCa2GizeuOfaLU2HOU/
                edit#slide=id.g1eb6c1b5_3_6>.

The fragment identifier on the URI appears to take as to the last slide; 
maybe drop it?

   [HUFFMAN]    Huffman, D., "A Method for the Construction of Minimum
                Redundancy Codes", Proceedings of the Institute of Radio
                Engineers Volume 40, Number 9, pp. 1098-1101,
                September 1952, <https://ieeexplore.ieee.org/xpl/
                articleDetails.jsp?arnumber=4051119>.

That URI leads me to a broken page due to mixed content problems. The 
HTTP version works fine.

   [SPDY]       Belshe, M. and R. Peon, "SPDY Protocol",
                draft-mbelshe-httpbis-spdy-00 (work in progress),
                February 2012.

It's not work in progress, and we should have a URI for it. Yes, that's 
something the RFC Editor will need to figure out :-).

Appendix A.  Static Table Definition

   The static table (see Section 2.3.1) consists of a predefined and
   unchangeable list of header fields.

   The static table was created by listing the most common header fields
   that are valid for messages exchanged inside a HTTP/2 connection.

Is that really true? How come we have Proxy-Auth* in it then, and also 
an unregistered header field ("refresh")? I believe we need to document 
how we came up with this list.

          | 16    | accept-encoding             | gzip, deflate |

Appendix B.  Huffman Code

   As an example, the code for the symbol 47 (corresponding to the ASCII
   character "/") consists in the 6 bits "0", "1", "1", "0", "0", "0".

"consists of"?

   This corresponds to the value 0x18 (in hexadecimal) encoded on 6
   bits.

"encoded in"?

C.3.

It would be cool to have an example with repeating header fields (maybe 
two instances of Cache-Control)?