Re: Last Call: <draft-bormann-cbor-04.txt> (Concise Binary Object Representation (CBOR)) to Proposed Standard

Paul Hoffman <paul.hoffman@xxxxxxxx> · Fri, 16 Aug 2013 18:48:30 -0700

On Aug 15, 2013, at 3:11 PM, Yaron Sheffer <yaronf.ietf@xxxxxxxxx> wrote:

> - A parser that looks for duplicates must be able to detect that {{"a":1, "b":2}:4, {"b":2, "a":1}:5} does in fact have a duplicate key, because the two internal maps (used as keys) are identical. So in general, parsers need to canonicalize maps to any depth in order to detect duplicates. This is "complex" by any definition of the word.

It does not need to canonicalize, but it does need to reify (or some word that means "know what each name means"). This is not additional code: the decoder already has that in the semantic processor. It is only additional runtime during decoding, and only in protocols/applications that use maps as keys.

We could say "you cannot use maps as keys in maps because it is too hard", but then the question is where do we draw the line on "too hard". Is it "too hard" to use arrays? They might be arrays with arrays in them; is that too hard? Instead of the CBOR spec saying "this is too hard for you to do", we at some point have to trust the protocol/application developer to understand the tradeoffs.

> - Even for an unloved diagnostic notation, you want people to read it without resorting to little pieces of paper. So a symbolic representation (TAG_URI) is definitely better than numbers.

You keep trying to elevate the diagnostic notation; we keep resisting.

> - I don't understand your reply re: tags when applied to arrays. Is it up to the application to decide whether the tag applies to all elements, to the first one, to the last 14?

My reply may have been unclear, but the text in -05 should be clearer:

A tag always applies to the item that is directly followed by it. Thus, if tag A is followed by tag B which is followed by data item C, tag A applies to the result of applying tag B on data item C. That is, a tagged item is a data item consisting of a tag and a value. The content of the tagged item is the data item (the value) that is being tagged.

> - Regarding error handling and security, I am slightly happier with -05, but not by much. For some reason you avoid using 2119 language in places where I would expect: parsers SHOULD include a "strict mode". A strict mode parser MUST fail when the syntax is broken (sec. 3.3, 3.4, and also 3.5). And so forth.

This feels like creep to me, but Carsten wants to add more about strict mode in the -06.

> - Unknown tags: you specifically allow decoders (Sec. 3.5) to not implement any tags they don't like, and then you specify the behavior of the decoder as "do whatever you feel like" (and this is in -05). So a sender cannot rely on *any* tag being implemented by the receiver, and cannot expect deterministic behavior if any are not. This is a security issue IMO, but also an interoperability thing.

In his new wording for strict mode, the decoder will not be able to ignore any tag.

--Paul Hoffman