Re: Last Call: <draft-bormann-cbor-04.txt> (Concise Binary Object Representation (CBOR)) to Proposed Standard

Carsten Bormann <cabo@xxxxxxx> · Tue, 6 Aug 2013 21:11:23 +0200

>> 2) No support for tag compression.

(I assume this was about map keys, not about tags.)

> That's an interesting requirement, and one that I think could be added to
> the design if there were others that felt motivated to help.  I think I
> can see a way that it could be added later: create a new tag that precedes
> a map of string-to-int conversions.

I'd probably do it the other way around:

	tagN([{1: "foo", 2: "bar"}, ...abbreviated data item...])
Where an abbreviated data item of the form
	[1, 2, 3, {1: "beer", 2: "wine", "baz": 1}, 5, 6]
would then be interpreted as
	[1, 2, 3, {"foo": "beer", "bar": "wine", "baz": 1}, 5, 6]

Yes, processing of this kind is easy to add as a tag.
If the first parameter is instead a URI (preferably ni: scheme), it could save carrying around a large dictionary.

> However, my intuition is that this wouldn't have radically better behavior
> than gzip, and so I'd like to see some numbers to prove that the
> complexity was worthwhile.

I share that intuition.  CBOR is intended to be useful also in those environments where running a full compression algorithm is impractical; here such a scheme could still have benefits.

>> The first one is my main complaint. I want to be able to use the binary
>> and text JSON encodings interchangeably and not have the upper layers to
>> have to bother with it at all.

(The applications I have in mind use media types, but:)

> I think I understand this.  I could see where my CBOR event-based parser
> could also take JSON in, and generate the exact same events.  I might even
> do that as a proof of concept.  Could you say more about what in CBOR you
> think violates this?

Well, if you don't have a media type, and don't know whether you'll get a JSON text or a CBOR data item, you may need to mechanically distinguish them.
E.g., the following six characters can occur at the start of a JSON text.
All are valid as start (or only) byte of a CBOR data item:

Byte    JSON meaning                CBOR interpretation

%x20  ; Space                       -1
%x09  ; Horizontal tab              9
%x0A  ; Line feed or New line       10
%x0D  ; Carriage return             13
%x5B  ; [ left square bracket       starts byte string
%x7B  ; { left curly bracket        starts UTF-8 string

(Well, for any valid JSON texts, heuristics might tell you the string data items a CBOR parser sees are unrealistically large.)

If a CBOR application does require initial signature bytes for self-description purposes, I would suggest using something like

	0xd8 0xf8 ...data item...

which decodes as tag248(data item); we could define 248 as a no-op tag.

(I'm still working on your other message -- lots of juicy input, thank you!)

Grüße, Carsten