I'm not concerned about this - the draft is UTF-8-only (it now explicitly forbids UTF-16 and UTF-32) and is written on the assumption that it's common knowledge that 7-bit ASCII (as octets with zero in the most significant bit) is a UTF-8 subset. Thanks, --David > -----Original Message----- > From: Manger, James [mailto:James.H.Manger@xxxxxxxxxxxxxxxx] > Sent: Thursday, December 11, 2014 5:51 PM > To: Paul Hoffman; Black, David > Cc: Nico Williams; General Area Review Team (gen-art@xxxxxxxx); json@xxxxxxxx; > ops-dir@xxxxxxxx; ietf@xxxxxxxx > Subject: RE: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text- > sequence-10 > > >> Abstract: > >> > >> This document describes the JSON text sequence format and associated > >> media type, "application/json-seq". A JSON text sequence consists of > >> any number of JSON texts, each prefix by an Record Separator > >> (U+001E), and each ending with a newline character (U+000A). > >> > >> "any number of JSON texts" -> "any number of UTF-8 encoded JSON texts" > > >This change concerns me, because it sounds like a JSON text sequence could > consist of JSON texts encoded in UTF-8 and other encodings. I would instead > prefer "any number of JSON texts, all encoded in UTF-8,". > > >> It also looks like ASCII names for RS and LF are being mixed w/Unicode > >> codepoints in the second sentence in the abstract. I'm not sure > >> that's a good thing to do, especially as the body of the draft refers > >> to RS and LF as being ASCII. Here are a couple of changes that would > remedy this: > >> > >> "an Record Separator (U+001E)" -> "an ASCII Record Separator (0x1E)" > >> "a newline character (U+000A)" -> "an ASCII newline character (0x0A)" > > >With John Cowan's change ("an ASCII Line Feed character (0x1E)" instead of > "an ASCII Record Separator (0x1E)"), that would indeed be clearer. > > > Please no. That would give an even worse mix of UTF-8 and ASCII, bytes and > characters, in the 1 sentence. > > ".. any number of JSON texts, all encoded in UTF-8, each prefixed by an > ASCII Record Separator (0x1E) .." > > How about: > > "A JSON text sequence consists of any number of JSON texts, > each prefixed by a Record Separator (U+001E) character, and > each suffixed by an End of Line (U+000A) character. It is > UTF-8 encoded." > > Say "Information Separator Two (U+001E)" if you really want to be pure. > > Mention in the body that "Record Separator" and "Information Separator Two" > are the ASCII and Unicode names for the same character (as are "Line Feed" and > "End of Line"), which is why RS and LF are used as ABNF names. > > P.S. The spec still defines the same ABNF names twice (RS, JSON-sequence): > once as bytes; once as Unicode scalars. Yuck. Just give them different names. > > -- > James Manger