RE: Best practice for data encoding?

"Hallam-Baker, Phillip" <pbaker@xxxxxxxxxxxx> · Tue, 6 Jun 2006 11:55:15 -0700

> From: Jeffrey Hutzelman [mailto:jhutz@xxxxxxx] 

> To be pedantic, ASN.1 is what its name says it is - a notation.
> The properties you go on to describe are those of BER; other 
> encodings have other properties.  For example, DER adds 
> constraints such that there are no longer multiple ways to 
> encode the same thing.  Besides simplifying implementations, 

Hate to bust your bubble here but DER encoding is vastly more complex than any other encoding. It is certainly not simpler than the BER encoding.

The reason for this is that in DER encoding each chunck of data is encoded using the definite length encoding in which each data structure is preceded by a length descriptor. In addition to being much more troublesome to decode than a simple end of structure market such as ), }, or </> it is considerably more complex to code because the length descriptor is itself a variable length integer.

The upshot of this is that it is impossible to write a LR(1) encoder for DER encoding. In order to encode the structure you have to recursively size each substructure before the first byte of the enclosing structure can be emitted.

> this also makes it possible to compare cryptographic hashes 
> of DER-encoded data; X.509 and Kerberos both take advantage 
> of this property. 

I am not aware of any X.509 system that relies on this property. If there is such a system they certainly are not making use of the ability to reduce a DER encoded structure to X.500 data and reassemble it. Almost none of the PKIX applications have done this properly until recently.

X.509 certs are exchanged as opaque binary blobs by all rational applications. 

> > Then there are MACRO definitions, VALUE specifications, and an even 
> > more complex definition of extension capabilities. In 
> short, ASN.1 is 
> > vastly more complex that the average TLV encoding. The 
> higher rate of 
> > errors is thus not entirely surprising.
> 
> There certainly is a rich set of features (read: complexity) 
> in both the
> ASN.1 syntax and its commonly-used encodings.  However, I 
> don't think that's the real source of the problem.  There 
> seem to be a lot of ad-hoc
> ASN.1 decoders out there that people have written as part of 
> some other protocol, instead of using an off-the-shelf 
> compiler/encoder/decoder; 

That's because most of the off the shelf compiler/encoders have historically been trash.

Where do you think all the bungled DER implementations came from?

> I also suspect that a number of the problems found have 
> nothing to do with decoding ASN.1 specifically, and would 
> have come up had other approaches been used.  For example, 
> several of the problems cited earlier were buffer overflows 
> found in code written well before the true impact of that 
> problem was well understood.  

Before the 1960s? I very much doubt it.

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf