RE: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-10

"Manger, James" <James.H.Manger@xxxxxxxxxxxxxxxx> · Fri, 12 Dec 2014 09:50:53 +1100

>> Abstract:
>> 
>>   This document describes the JSON text sequence format and associated
>>   media type, "application/json-seq".  A JSON text sequence consists of
>>   any number of JSON texts, each prefix by an Record Separator
>>   (U+001E), and each ending with a newline character (U+000A).
>> 
>> "any number of JSON texts" -> "any number of UTF-8 encoded JSON texts"

>This change concerns me, because it sounds like a JSON text sequence could consist of JSON texts encoded in UTF-8 and other encodings. I would instead prefer "any number of JSON texts, all encoded in UTF-8,".

>> It also looks like ASCII names for RS and LF are being mixed w/Unicode 
>> codepoints in the second sentence in the abstract.  I'm not sure 
>> that's a good thing to do, especially as the body of the draft refers 
>> to RS and LF as being ASCII.  Here are a couple of changes that would remedy this:
>> 
>>   "an Record Separator (U+001E)" -> "an ASCII Record Separator (0x1E)"
>>   "a newline character (U+000A)" -> "an ASCII newline character (0x0A)"

>With John Cowan's change ("an ASCII Line Feed character (0x1E)" instead of "an ASCII Record Separator (0x1E)"), that would indeed be clearer.

Please no. That would give an even worse mix of UTF-8 and ASCII, bytes and characters, in the 1 sentence.

  ".. any number of JSON texts, all encoded in UTF-8, each prefixed by an ASCII Record Separator (0x1E) .."

How about:

  "A JSON text sequence consists of any number of JSON texts,
   each prefixed by a Record Separator (U+001E) character, and
   each suffixed by an End of Line (U+000A) character. It is
   UTF-8 encoded."

Say "Information Separator Two (U+001E)" if you really want to be pure.

Mention in the body that "Record Separator" and "Information Separator Two" are the ASCII and Unicode names for the same character (as are "Line Feed" and "End of Line"), which is why RS and LF are used as ABNF names.

P.S. The spec still defines the same ABNF names twice (RS, JSON-sequence): once as bytes; once as Unicode scalars. Yuck. Just give them different names.

--
James Manger