Re: [secdir] [Json] secdir review of draft-ietf-jsonbis-rfc7159bis-03

Nico Williams <nico@xxxxxxxxxxxxxxxx> · Mon, 13 Mar 2017 13:15:34 -0500

On Mon, Mar 13, 2017 at 09:14:16AM +0100, Julian Reschke wrote:
> So the changes in RFC 7159 allow top-level strings, so we can't rely on the
> first *two* characters being US-ASCII. But we *can* rely on the first one
> being US-ASCII, no?

Correct.

If one OR two bytes of the first four are NULs, then the encoding is
UTF-16 (or something else or invalid):

> So the following should still be correct:
> 
> >   Since the first character of a JSON text will always be an ASCII
> >   character [RFC0020], it is possible to determine whether an octet
> >   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
> >   at the pattern of nulls in the first four octets.
> >
> >           00 00 00 xx  UTF-32BE
> >           00 xx xx xx  UTF-16BE
> >           xx 00 00 00  UTF-32LE
> >           xx 00 xx xx  UTF-16LE
> >           xx xx xx xx  UTF-8

Count the number of NULs in the first four bytes:

 - if zero -> UTF-8
 - if one or two -> UTF-16
 - if three -> UTF-32

Nico
--