[I've been traveling, please excuse my late responses.] On Sun, Dec 07, 2014 at 07:55:41PM +0100, Patrik Fältström wrote: > > On 7 dec 2014, at 19:05, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote: > > Patrik Fältström scripsit: > > > >> But it also reference RFC7159, which doesn't require UTF-8 but instead > >> for some weird reason also allow other encodings of Unicode text. And > >> on top of that it says Byte Order Mark is not allowed. > > > > 7159 was meant to tighten the wording of 4627, not to impose additional > > constraints on it. For that, see the I-JSON draft. > > The problem I have is that 7159 is not tight enough as it allows other > encodings than UTF-8, which in turn make the encoding not work very > well as this draft take for granted each one of the separator > characters is one byte each. > > I.e. the way I read draft-ietf-json-text-sequence (and I might be > wrong), you have specific octet values that act as separators. That > only works if the encoding is UTF-8. Right. I'll add text to section 2.2 saying that the JSON texts have to be encoded in UTF-8. > See Figure 1: > > > possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded > > ; JSON text (see RFC7159) > > Now, if this is NOT UTF-8, then this might be pretty bad situation. Well, you can always fuzz test a parser... :) But yes, the encoder should use UTF-8. > What I am saying is that I would like this draft to explicitly say > that the only profile of RFC7159 that can be used is when UTF-8 is in > use, i.e. somewhere something like "The encoding MUST be UTF-8, > although RFC7159 also allow other encodings, like UTF-16." Then in the > security considerations section that "RFC7159 do allow not only UTF-8 > encoding but also for example UTF-16, which MIGHT create problems for > a parser, all depending on what data is serialized." > > I.e. I want this draft to be even more tight than RFC7159. I agree with this. This was always my intent (as in: I never intended to support UTF-16 or UTF-32, say, or any other UTF, in any implementation of mine). And I agree that this format doesn't work with UTF-16 or UTF-32 EVEN IF the UTF were part of the MIME type and UTF-specific multi-byte separators were used: because at least for log-type applications where atomicity of writes is in question multi-byte separators won't do. Nico --