Hi there,
these comments are mostly editorial; the main exception being the
contents of the predefined static table.
Best regards, Julian
1. Introduction
In HTTP/1.1 (see [RFC7230]), header fields are not compressed. As
Web pages have grown to include dozens to hundreds of requests, the
s/include/cause/ (the page doesn't really include requests)
1.1. Overview
The format defined in this specification treats a list of header
fields as an ordered collection of name-value pairs that can include
duplicates. Names and values are considered to be opaque sequences
of octets, and the order of header fields is preserved after being
compressed and decompressed.
When we say duplicate here, it's about duplicate *names*, right?
2.1. Header List Ordering
HPACK preserves the ordering of header fields inside the header list.
An encoder MUST order header field representations in the header
block according to their ordering in the original header list. A
decoder MUST order header fields in the decoded header list according
to their ordering in the header block.
Just before we said that we do not define the encoder. Maybe reprase the
first MUST as a statement of fact.
2.3.2. Dynamic Table
The dynamic table can contain duplicate entries. Therefore,
duplicate entries MUST NOT be treated as an error by a decoder.
What's the definition of "duplicate" here? Name and value match?
3.2. Header Field Representation Processing
The processing of a header block to obtain a header list is defined
in this section. To ensure that the decoding will successfully
produce a header list, a decoder MUST obey the following rules.
"This section defines..."
4.1. Calculating Table Size
The size of the dynamic table is the sum of the size of its entries.
The size of an entry is the sum of its name's length in octets (as
defined in Section 5.2), its value's length in octets (see
Section 5.2), plus 32.
Remove one of the references to Section 5.2.
The size of an entry is calculated using the length of the name and
value without any Huffman encoding applied.
This is the first time that Huffman is mentioned; it would probably be
good to mention it earlier on.
NOTE: The additional 32 octets account for the overhead associated
with an entry. For example, an entry structure using two 64-bit
pointers to reference the name and the value of the entry, and two
64-bit integers for counting the number of references to the name and
value would have 32 octets of overhead.
maybe say "estimated overhead"?
5.1. Integer Representation
Integers are used to represent name indexes, pair indexes or string
lengths. To allow for optimized processing, an integer
representation always finishes at the end of an octet.
First use of "pair index"; at this point it's not clear what that might
be.
Pseudo-code to represent an integer I is as follows:
if I < 2^N - 1, encode I on N bits
else
encode (2^N - 1) on N bits
I = I - (2^N - 1)
while I >= 128
encode (I % 128 + 128) on 8 bits
I = I / 128
encode I on 8 bits
Pseudo-code to decode an integer I is as follows:
decode I from the next N bits
if I < 2^N - 1, return I
else
M = 0
repeat
B = next octet
I = I + (B & 127) * 2^M
M = M + 7
while B & 128 == 128
return I
We have bad experience with making indentation in artwork significant;
maybe introducing brackets would be a good idea.
This integer representation allows for values of indefinite size. It
is also possible for an encoder to send a large number of zero
values, which can waste octets and could be used to overflow integer
values. Excessively large integer encodings - in value or octet
length - MUST be treated as a decoding error. Different limits can
be set for each of the different uses of integers, based on
implementation constraints.
Having a MUST here when we don't say what "excessive" is seems like a
bad idea.
5.2. String Literal Representation
Header field names and header field values can be represented as
literal string. A literal string is encoded as a sequence of octets,
either by directly encoding the literal string's octets, or by using
a Huffman code (see [HUFFMAN]).
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| H | String Length (7+) |
+---+---------------------------+
| String Data (Length octets) |
+-------------------------------+
Figure 4: String Literal Representation
A literal string representation contains the following fields:
H: A one bit flag, H, indicating whether or not the octets of the
string are Huffman encoded.
String Length: The number of octets used to encode the string
literal, encoded as an integer with 7-bit prefix (see
Section 5.1).
String Data: The encoded data of the string literal. If H is '0',
then the encoded data is the raw octets of the string literal. If
H is '1', then the encoded data is the Huffman encoding of the
string literal.
The figure could be interpreted as if the string length is always
expressed in 7 bits, which I believe is not true.
6.2.1. Literal Header Field with Incremental Indexing
A literal header field with incremental indexing representation
results in appending a header field to the decoded header list and
inserting it as a new entry into the dynamic table.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| 0 | 1 | Index (6+) |
+---+---+-----------------------+
| H | Value Length (7+) |
+---+---------------------------+
| Value String (Length octets) |
+-------------------------------+
Figure 6: Literal Header Field with Incremental Indexing - Indexed
Name
See above. One can read this as "H" being bit 0 in the second octet; but
that's not always the case. I think it would be good to revise the
figure format.
6.3. Dynamic Table Size Update
The new maximum size MUST be lower than or equal to the last value of
the SETTINGS_HEADER_TABLE_SIZE parameter (see Section 6.5.2 of
[HTTP2]) received from the decoder and acknowledged by the encoder
(see Section 6.5.3 of [HTTP2]).
Some parts of the spec read as if HPACK doesn't have direct dependency
on HTTP/2 or HTTP in general. However, here a normative dependency is
introduced.
7.1. Probing Dynamic Table State
This is possible even over TLS, because while TLS provides
confidentiality protection for content, it only provides a limited
amount of protection for the length of that content.
Expand "TLS" on first use, and add a reference.
9. References
9.1. Normative References
[HTTP2] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
Transfer Protocol version 2",
draft-ietf-httpbis-http2-16 (work in progress),
October 2014.
November (already fixed in Git).
9.2. Informative References
[CRIME] Rizzo, J. and T. Duong, "The CRIME Attack",
September 2012, <https://docs.google.com/a/twist.com/
presentation/d/
11eBmGiHbYcHR9gL5nDyZChu_-lCa2GizeuOfaLU2HOU/
edit#slide=id.g1eb6c1b5_3_6>.
The fragment identifier on the URI appears to take as to the last slide;
maybe drop it?
[HUFFMAN] Huffman, D., "A Method for the Construction of Minimum
Redundancy Codes", Proceedings of the Institute of Radio
Engineers Volume 40, Number 9, pp. 1098-1101,
September 1952, <https://ieeexplore.ieee.org/xpl/
articleDetails.jsp?arnumber=4051119>.
That URI leads me to a broken page due to mixed content problems. The
HTTP version works fine.
[SPDY] Belshe, M. and R. Peon, "SPDY Protocol",
draft-mbelshe-httpbis-spdy-00 (work in progress),
February 2012.
It's not work in progress, and we should have a URI for it. Yes, that's
something the RFC Editor will need to figure out :-).
Appendix A. Static Table Definition
The static table (see Section 2.3.1) consists of a predefined and
unchangeable list of header fields.
The static table was created by listing the most common header fields
that are valid for messages exchanged inside a HTTP/2 connection.
Is that really true? How come we have Proxy-Auth* in it then, and also
an unregistered header field ("refresh")? I believe we need to document
how we came up with this list.
| 16 | accept-encoding | gzip, deflate |
Appendix B. Huffman Code
As an example, the code for the symbol 47 (corresponding to the ASCII
character "/") consists in the 6 bits "0", "1", "1", "0", "0", "0".
"consists of"?
This corresponds to the value 0x18 (in hexadecimal) encoded on 6
bits.
"encoded in"?
C.3.
It would be cool to have an example with repeating header fields (maybe
two instances of Cache-Control)?