RE: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

"Black, David" <david.black@xxxxxxx> · Mon, 8 Dec 2014 02:59:44 +0000

I agree with Patrik - this draft assumes UTF-8 encoding and should
state that requirement explicitly.  John's proposed text change below
is in section 2.1 for the decoder; the encoder text in section 2.2
needs a corresponding change:

OLD
   In prose: any number of JSON texts, each preceded by one ASCII RS
   character and each followed by a line feed (LF).
NEW
   In prose: any number of JSON texts encoded as UTF-8, each preceded
   by one ASCII RS character and each followed by a line feed (LF).

Thanks,
--David

> -----Original Message-----
> From: John Cowan [mailto:cowan@xxxxxxxx] On Behalf Of John Cowan
> Sent: Sunday, December 07, 2014 4:08 PM
> To: Patrik Fältström
> Cc: Black, David; Nico Williams; General Area Review Team (gen-art@xxxxxxxx);
> json@xxxxxxxx; ops-dir@xxxxxxxx; ietf@xxxxxxxx
> Subject: Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-
> sequence-09
> 
> Patrik Fältström scripsit:
> 
> > I.e. the way I read draft-ietf-json-text-sequence (and I might be
> > wrong), you have specific octet values that act as separators. That
> > only works if the encoding is UTF-8.
> 
> This is a binary representation which has embedded JSON texts represented
> in UTF-8.  Since the first character in a JSON text is necessarily in
> the ASCII repertoire, it is not possible to parse a UTF-16 or UTF-32
> JSON text as UTF-8 and come out with valid JSON.
> 
> However, I grant that mentioning UTF-8 only in an ABNF comment is not
> really prominent enough.  Proposed wording change:
> 
> For:
> 
>    In prose: a series of octet strings, each containing any octet other
>    than a record separator (RS) (0x1E) [RFC0020], all octet strings
>    separated from each other by RS octets.  Each octet string in the
>    sequence is to be parsed as a JSON text.
> 
> read:
> 
>    In prose: a series of octet strings, each containing any octet other
>    than a record separator (RS) (0x1E) [RFC0020], all octet strings
>    separated from each other by RS octets.  Each octet string in the
>    sequence is to be parsed as a JSON text in UTF-8 encoding.
> 
> and add a suitable reference to UTF-8.
> 
> > Ok, so what you say is that a string in an attribute value in the JSON
> > blob can still start with U+FEFF?
> 
> Just so.
> 
> 
> --
> John Cowan          http://www.ccil.org/~cowan        cowan@xxxxxxxx
> As we all know, civil libertarians are not the friskiest group around --
> comes from  forever being on the qui vive for the sound of jack-booted
> fascism coming down the pike.           --Molly Ivins