Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

Nico Williams <nico@xxxxxxxxxxxxxxxx> · Mon, 8 Dec 2014 21:30:57 -0600

[I've been traveling, please excuse my late responses.]

On Sun, Dec 07, 2014 at 07:55:41PM +0100, Patrik Fältström wrote:
> > On 7 dec 2014, at 19:05, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote:
> > Patrik Fältström scripsit:
> > 
> >> But it also reference RFC7159, which doesn't require UTF-8 but instead
> >> for some weird reason also allow other encodings of Unicode text. And
> >> on top of that it says Byte Order Mark is not allowed.
> > 
> > 7159 was meant to tighten the wording of 4627, not to impose additional
> > constraints on it.  For that, see the I-JSON draft.
> 
> The problem I have is that 7159 is not tight enough as it allows other
> encodings than UTF-8, which in turn make the encoding not work very
> well as this draft take for granted each one of the separator
> characters is one byte each.
> 
> I.e. the way I read draft-ietf-json-text-sequence (and I might be
> wrong), you have specific octet values that act as separators. That
> only works if the encoding is UTF-8.

Right.  I'll add text to section 2.2 saying that the JSON texts have to
be encoded in UTF-8.

> See Figure 1:
> 
> > possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
> >                                ; JSON text (see RFC7159)
> 
> Now, if this is NOT UTF-8, then this might be pretty bad situation.

Well, you can always fuzz test a parser... :)

But yes, the encoder should use UTF-8.

> What I am saying is that I would like this draft to explicitly say
> that the only profile of RFC7159 that can be used is when UTF-8 is in
> use, i.e. somewhere something like "The encoding MUST be UTF-8,
> although RFC7159 also allow other encodings, like UTF-16." Then in the
> security considerations section that "RFC7159 do allow not only UTF-8
> encoding but also for example UTF-16, which MIGHT create problems for
> a parser, all depending on what data is serialized."
> 
> I.e. I want this draft to be even more tight than RFC7159.

I agree with this.  This was always my intent (as in: I never intended
to support UTF-16 or UTF-32, say, or any other UTF, in any
implementation of mine).

And I agree that this format doesn't work with UTF-16 or UTF-32 EVEN IF
the UTF were part of the MIME type and UTF-specific multi-byte
separators were used: because at least for log-type applications where
atomicity of writes is in question multi-byte separators won't do.

Nico
--