Re: Last Call: <draft-bormann-cbor-04.txt> (Concise Binary Object Representation (CBOR)) to Proposed Standard

Yaron Sheffer <yaronf.ietf@xxxxxxxxx> · Sat, 17 Aug 2013 11:19:35 +0300

Hi Paul,

please see below.

Thanks,
	Yaron

On 2013-08-17 04:48, Paul Hoffman wrote:
On Aug 15, 2013, at 3:11 PM, Yaron Sheffer <yaronf.ietf@xxxxxxxxx>
wrote:

- A parser that looks for duplicates must be able to detect that
{{"a":1, "b":2}:4, {"b":2, "a":1}:5} does in fact have a duplicate
key, because the two internal maps (used as keys) are identical. So
in general, parsers need to canonicalize maps to any depth in order
to detect duplicates. This is "complex" by any definition of the
word.

It does not need to canonicalize, but it does need to reify (or some
word that means "know what each name means"). This is not additional
code: the decoder already has that in the semantic processor. It is
only additional runtime during decoding, and only in
protocols/applications that use maps as keys.

We could say "you cannot use maps as keys in maps because it is too
hard", but then the question is where do we draw the line on "too
hard". Is it "too hard" to use arrays? They might be arrays with
arrays in them; is that too hard? Instead of the CBOR spec saying
"this is too hard for you to do", we at some point have to trust the
protocol/application developer to understand the tradeoffs.

I don't understand why you think you are shifting the complexity to the
application/protocol. Detecting duplicates is the responsibility of a 
generic decoder (parser?), and any complexity will fall there. As will 
any resultant security vulnerabilities.

In fact you do NOT need to understand "what each name means", you need
precisely a canonical order. Even if we don't like the C word for
historical reasons.

Since it's trivial to emulate complex keys with arrays, I would 
recommend to avoid this whole thing and only allow keys that are simple 
values. And even then you need to define how data items are compared, 
i.e. whether there are duplicates in {1:"a", 1.0:"a", "1":"a"}.

- Even for an unloved diagnostic notation, you want people to read
it without resorting to little pieces of paper. So a symbolic
representation (TAG_URI) is definitely better than numbers.

You keep trying to elevate the diagnostic notation; we keep
resisting.

If it's a diagnostic notation, we want people (not machines) to read it.
People can read symbolic values ("TAG_MIME") much better than numbers
("36").

- I don't understand your reply re: tags when applied to arrays. Is
it up to the application to decide whether the tag applies to all
elements, to the first one, to the last 14?

My reply may have been unclear, but the text in -05 should be
clearer:

A tag always applies to the item that is directly followed by it.
Thus, if tag A is followed by tag B which is followed by data item C,
tag A applies to the result of applying tag B on data item C. That
is, a tagged item is a data item consisting of a tag and a value. The
content of the tagged item is the data item (the value) that is being
tagged.

This is indeed good text. I suggest to also add: "A tag followed by an
aggregate data item (map or array) applies to all members of the data item."

[snip]

--Paul Hoffman