re: Last Call: <draft-bormann-cbor-04.txt> (Concise Binary Object Representation (CBOR)) to Proposed Standard

Yaron Sheffer <yaronf.ietf@xxxxxxxxx> · Wed, 14 Aug 2013 00:11:45 +0300

Dear authors,

sorry I'm submitting these comments after the end of the LC period. I 
hope they can still be of use.

- The document is well written and very clearly explained.
- I am still of the opinion that this document should better be 
published as Experimental RFC. Unlike TCP and UDP. But the comments 
below are unrelated to this discussion.
- The "diagnostic notation" can be used very effectively for things like 
configuration files, e.g. if an application already uses CBOR on the 
wire. Therefore I would suggest to formalize it a bit more, so that we 
also have interoperability at this level.
- And since this notation is not meant as a JSON extension, this is a 
good time to introduce comments (e.g. with an initial '#') into the 
notation.
- The positive vs. negative encoding means that the parser actually 
deals with 9-, 17-, 33- and 65-bit integers. I don't think this makes it 
easier to write parsers.
- Arrays are prefixed by the number of elements but not by their length 
in bytes. And elements need not be all of the same size. So you cannot 
skip the array without fully parsing every last element. IIRC this is a 
major disadvantage compared to ASN.1 encodings.
- A puzzling change from JSON, and one that probably complicates 
implementations quite a bit, is that a map's index can be of any type, 
not just a string. And this includes mixed index types for the same map.
- And similarly to arrays, you cannot skip a map element without deep 
parsing of the element.
- I think many of the tag values are too specific, and are best left to 
applications. For example, why should the format care if the app encodes 
a UTF-8 string in base64? OTOH, I would reserve a part of the tag space 
for "private" application-specific allocations.
- One tag value you may want to consider adding is "critical" in the 
security sense of the word, i.e., an application is required to fail if 
it does not understand the value (probably best applied to map keys).
- In the "diagnostic notation", I suggest to use symbolic values rather 
than integers for tags, e.g. TAG_URI.
- Sec. 3: because of the need for deep parsing mentioned above, a wire 
protocol cannot be extended by adding an element that uses a new data 
type (e.g. double precision FP) unless all potential recipients 
understand the type, even though they might not need to use the data 
element.
- Type restrictions for tags should be spelled out more clearly. E.g. in 
2.4.4.2, please clarify that when this tag applies to an array or map, 
*all* the items (and potentially items of nested arrays/maps?) MUST be 
byte strings. IMHO this just adds complexity and it's best to only tag 
the atomic item.
- Text such as this (for unknown simple types): "might issue a warning, 
might stop processing altogether, might handle the error by making the 
unknown value available to the application as such, or take some other 
type of action." is a security disaster waiting to happen. Also, it does 
not allow extensibility. Even though the encoding format is nominally 
extensible, in reality you cannot add stuff because the behavior of 
existing implementations when faced with it is unpredictable.
- Similarly for unknown tags (which IMHO should be ignored). Note that 
"unknown" includes currently specified tags, because implementations are 
not required to implement all current tags.
- Another security issue, for incomplete arrays: "a parser may 
completely fail the decoding, or substitute the missing data and data 
items using an decoder-specific convention. " This is a buffer overflow 
vulnerability by a different name.
- And by the way the entire Sec. 3 is non-normative. I suggest to use 
normative language for parser behavior, to ensure it is deterministic.

Thanks,
    Yaron