RE: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-10

"Black, David" <david.black@xxxxxxx> · Thu, 11 Dec 2014 23:04:36 +0000

I'm not concerned about this - the draft is UTF-8-only (it now explicitly
forbids UTF-16 and UTF-32) and is written on the assumption that it's common
knowledge that 7-bit ASCII (as octets with zero in the most significant bit)
is a UTF-8 subset.

Thanks,
--David

> -----Original Message-----
> From: Manger, James [mailto:James.H.Manger@xxxxxxxxxxxxxxxx]
> Sent: Thursday, December 11, 2014 5:51 PM
> To: Paul Hoffman; Black, David
> Cc: Nico Williams; General Area Review Team (gen-art@xxxxxxxx); json@xxxxxxxx;
> ops-dir@xxxxxxxx; ietf@xxxxxxxx
> Subject: RE: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-
> sequence-10
> 
> >> Abstract:
> >>
> >>   This document describes the JSON text sequence format and associated
> >>   media type, "application/json-seq".  A JSON text sequence consists of
> >>   any number of JSON texts, each prefix by an Record Separator
> >>   (U+001E), and each ending with a newline character (U+000A).
> >>
> >> "any number of JSON texts" -> "any number of UTF-8 encoded JSON texts"
> 
> >This change concerns me, because it sounds like a JSON text sequence could
> consist of JSON texts encoded in UTF-8 and other encodings. I would instead
> prefer "any number of JSON texts, all encoded in UTF-8,".
> 
> >> It also looks like ASCII names for RS and LF are being mixed w/Unicode
> >> codepoints in the second sentence in the abstract.  I'm not sure
> >> that's a good thing to do, especially as the body of the draft refers
> >> to RS and LF as being ASCII.  Here are a couple of changes that would
> remedy this:
> >>
> >>   "an Record Separator (U+001E)" -> "an ASCII Record Separator (0x1E)"
> >>   "a newline character (U+000A)" -> "an ASCII newline character (0x0A)"
> 
> >With John Cowan's change ("an ASCII Line Feed character (0x1E)" instead of
> "an ASCII Record Separator (0x1E)"), that would indeed be clearer.
> 
> 
> Please no. That would give an even worse mix of UTF-8 and ASCII, bytes and
> characters, in the 1 sentence.
> 
>   ".. any number of JSON texts, all encoded in UTF-8, each prefixed by an
> ASCII Record Separator (0x1E) .."
> 
> How about:
> 
>   "A JSON text sequence consists of any number of JSON texts,
>    each prefixed by a Record Separator (U+001E) character, and
>    each suffixed by an End of Line (U+000A) character. It is
>    UTF-8 encoded."
> 
> Say "Information Separator Two (U+001E)" if you really want to be pure.
> 
> Mention in the body that "Record Separator" and "Information Separator Two"
> are the ASCII and Unicode names for the same character (as are "Line Feed" and
> "End of Line"), which is why RS and LF are used as ABNF names.
> 
> P.S. The spec still defines the same ABNF names twice (RS, JSON-sequence):
> once as bytes; once as Unicode scalars. Yuck. Just give them different names.
> 
> --
> James Manger