Re: Appeal from Phillip Hallam-Baker on the publication of RFC 7049 on the Standards Track

Phillip Hallam-Baker <hallam@xxxxxxxxx> · Thu, 20 Feb 2014 10:40:20 -0500

On Thu, Feb 20, 2014 at 2:22 AM, Eliot Lear <lear@xxxxxxxxx> wrote:

<no hat>

On 2/20/14, 2:28 AM, Mark Nottingham wrote:

> On 20 Feb 2014, at 11:37 am, Phillip Hallam-Baker <hallam@xxxxxxxxx> wrote:

>

>> My main concern is the process question. I really don't care whether CBOR is a PROPOSED STANDARD or whatever. What I do care about is if I am told that I have to use it because that is the IETF standard for binary encoding. And what I care most about is the risk that this approach of 'its our ball and only we will decide who gets to play' is going to be repeated.

> I have to agree with Phillip on this point, and I hope the answer is uncontroversial -- that just by virtue of being an IETF standard, we don't start requiring people to use something already defined if their use case is vaguely similar.

>

> When something is a standard, it means you need to use it in the way specified; it doesn't mean you have to choose to use it, even in other standards.

Yeah, we're off the rails here, and it's becoming a bad habit.  People

seem to like playing "what if" games about how bad things can get if

everyone loses their heads.  WGs and spec developers should always use

what makes sense (standard or no).  Rough consensus and running code,

thank you very much.

Oddly enough, compact formats for protocol encodings are one specification that never need to be implemented to have value.

Back in the XKMS vs SCVP days a journalist tried to compare the two specs. Since all they understood was the message size, that was the basis for comparison. An SCVP message is maybe 1KB and an XKMS message might be as much as 3KB (I am guessing here).

Given that both easily fit into an IP packet and they involve public key cryptography, the size of the message would be completely irrelevant even if the two protocols were equivalent (which they are not). But the story was taken up and used as FUD.

If we had has a compact encoding for XML, we could have easily defeated the FUD by pointing out that people who care can use the efficient encoding option and the 'problem' goes away.

The issue can also come up at the design stage. Earlier this morning someone proposed the following as an example of a JSON microformat:

microformat GPSLocation;

   // A GPSLocation is a pair of comma separated floating-point

   // numbers representing longitude and latitude.

   // e.g. "location": "0.0,51.5"

But this is not using JSON encoding at all, it is a string with two decimal fractions separated by a coma. The JSON encoding would be:

"location" : {"X":0.0,"Y":51.5}

The justification for the microformat is of course byteshaving. And of course the example being artificial is unrepresentative. In the normal case the comparison would be 

"210.012345,51.52232" vs {"X" : 210.012345,"Y" : 51.52232} 

The difference is 3 control bytes versus 9 for the tagged version. Which isn't actually a lot of overhead for a tagged text format.

But imagine we have an efficient binary encoding that can represent those numbers in 5 bytes rather than 10. The efficient binary encoding is much more efficient than the handcoded binary.

Having a compact binary notation about provides a tool that can be swung when someone is messing about with a microformat that is only going to cause grief. If a specification is using JSON encoding then it should use JSON encoding, not JSON plus a bunch of poorly thought out ad hoc hacks that seemed like a good idea at the time.

When people reach for regular expressions I reach for the sick bag.

Regular expressions are a very powerful tool that can be used to create more complexity in fewer lines of code than any other language I know (including APL). They are like using GOTOs. Every non-trivial program without exception has GOTO statements but in a structured programing language the programmer does not code them directly or need to.

So one of the challenges in using XML or JSON or any other regular data encoding in a standards effort is to block attempts to sneak in other encodings by way of 'microformats' that always looks such a good idea till they have to be coded.

Having a binary encoding for the data encoding at hand allows the microformats to the swatted away. Doing better than a text encoding is pretty easy. Doing better than a well designed binary encoding is actually hard.

-- 
Website: http://hallambaker.com/