Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

Nico Williams <nico@xxxxxxxxxxxxxxxx> · Tue, 9 Dec 2014 18:49:30 -0600

On Tue, Dec 09, 2014 at 06:49:35PM +0000, Black, David wrote:
> > So I think we really do need to say something about top-level numbers
> > (and true, false, and null), namely: that they must be delimited by
> > whitespace, that '<RS>1234<RS>' is not a valid sequence element because
> > the number may have been truncated.  (Ditto '<RS>true<RS>', since the
> > intended text could have been 'trueish', which is invalid of course, but
> > still.)
> 
> That would be more robust, as then all JSON texts in a sequence have
> delimiters and absence of the closing delimiter clearly indicates
> truncation.

OK.

New section 2.4 text:

   While objects, arrays, and strings are self-delimited in JSON texts,
   numbers, and the values 'true', 'false', and 'null' are not.  Only
   whitespace can delimit the latter four kinds of values.

   Parsers MUST check that any JSON texts that are a top-level number,
   or which might be 'true', 'false', or 'null' include JSON whitespace
   (at least one byte matching the "ws" ABNF rule from RFC7159) after
   that value, otherwise the JSON-text may have been truncated.  Note
   that the LF following each JSON text matches the "ws" ABNF rule.

   Parsers MUST drop JSON-text sequence elements consisting of
   non-self-delimited top-level values that may have been truncated
   (that are not delimited by whitespace).  Parsers can report such
   texts as warnings (including, optionally, the parsed text and/or the
   original octet string).

   For example, '<RS>123<RS>' might have been intended to carry the
   top-level number 123.4, but must have been truncated.  Similarly,
   '<RS>true<RS>' might have been intended to carry the invalid text
   'trueish'.  '<RS>truefales<RS>' is not two top-level values, 'true',
   and 'false'; it is simply not a valid JSON text.

This is the only place where the ws rule comes up, so merely saying "at
least one byte matching" it should suffice.

I'm also adding this following the above, based on your comment about
incremental parsers:

   Implementations may produce a value when parsing '<RS>"foo"<RS>'
   because their JSON text parser might be able to consume bytes
   incrementally, and since the JSON text in this case is a
   self-delimiting top-level value, the parser can produce the result
   without consuming an additional byte.  Such implementations should
   skip to the next RS byte, possibly reporting any intervening
   non-whitespace bytes.

(yes, I think this should be a 'should', not a 'SHOULD').

Nico
--