Re: [Json] secdir review of draft-ietf-jsonbis-rfc7159bis-03

ned+ietf@xxxxxxxxxxxxxxxxx · Sat, 11 Mar 2017 07:41:17 -0800 (PST)

On 2017-03-11 03:08, John Cowan wrote:
>
> On Thu, Mar 9, 2017 at 12:53 AM, Benjamin Kaduk <kaduk@xxxxxxx
> <mailto:kaduk@xxxxxxx>> wrote:
>
>     If that's what's supposed to happen, it should probably be more
>     clear, yes.  (But aren't there texts that have valid interpretations
>     in multiple encodings?)
>
>
> Not if the content is well-formed JSON and the only possible encodings
> are UTF-8, UTF-16, and UTF-32.  It suffices to examine the first four
> bytes of the input.  If there are no NUL bytes in the first four bytes,
> it is UTF-8; if there are two NUL bytes, it is UTF-16; if there are
> three NUL bytes, it is UTF-32.  This works because the grammar requires
> the first character to be in the ASCII repertoire, and the NUL
> *character* (U+0000) is not allowed at all.

Good explanation. Maybe the spec should include it.

+1

This exact issue just came up in a media type review, where someone
specified a charset parameter because they weren't aware of this algorithm.

It would be very helpful to have this text in the RFC.

				Ned