> On 7 dec 2014, at 19:05, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote: > > Patrik Fältström scripsit: > >> But it also reference RFC7159, which doesn't require UTF-8 but instead >> for some weird reason also allow other encodings of Unicode text. And >> on top of that it says Byte Order Mark is not allowed. > > 7159 was meant to tighten the wording of 4627, not to impose additional > constraints on it. For that, see the I-JSON draft. The problem I have is that 7159 is not tight enough as it allows other encodings than UTF-8, which in turn make the encoding not work very well as this draft take for granted each one of the separator characters is one byte each. I.e. the way I read draft-ietf-json-text-sequence (and I might be wrong), you have specific octet values that act as separators. That only works if the encoding is UTF-8. See Figure 1: > possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded > ; JSON text (see RFC7159) Now, if this is NOT UTF-8, then this might be pretty bad situation. What I am saying is that I would like this draft to explicitly say that the only profile of RFC7159 that can be used is when UTF-8 is in use, i.e. somewhere something like "The encoding MUST be UTF-8, although RFC7159 also allow other encodings, like UTF-16." Then in the security considerations section that "RFC7159 do allow not only UTF-8 encoding but also for example UTF-16, which MIGHT create problems for a parser, all depending on what data is serialized." I.e. I want this draft to be even more tight than RFC7159. Let me ask it this way: is there any reason to allow other encodings than UTF-8? If so, how do you handle the encoding of the separators? >> This together implies that first of all this draft might not lead to >> stable implementations, secondly one can not store in JSON strings >> that include the Byte Order Mark, and there are other unspecified >> situations. > > If by that you mean that a JSON string may not contain U+FEFF, that is > incorrect, for U+FEFF is recognized as a BOM only when placed at the > beginning of an entity body, whereas an entity body in JSON format can > begin only with [ or { classically, or by extension with [0-9"tfn]. Ok, so what you say is that a string in an attribute value in the JSON blob can still start with U+FEFF? If so, good, and my apologies for not understanding this at my read of the text. Patrik
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail