Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

Nico Williams <nico@xxxxxxxxxxxxxxxx> · Tue, 9 Dec 2014 11:17:29 -0600

On Tue, Dec 09, 2014 at 04:41:12PM +0000, Black, David wrote:
> [A] JSON text parse failures
> > [...]
> 
> Your alternative wording "whenever the JSON text parse fails, ..." is fine.

OK.

> [D] Truncation
> 
> > A missing terminating LF is not a problem for strings, arrays, or
> > objects.  I seem to recall that we did discuss this.  We could require
> > that such texts fail to parse, but perhaps the more important thing is
> > to require common parser behavior as to such truncations.
> > 
> > You ABNF proposal is certainly more strict than the one in the I-D.  I'm
> > neutral as to whether this form or the one in the I-D (with the ws issue
> > fixed) is better.  The stricter form is clearly easier to talk about,
> > therefore preferable, but it will mean discarding texts where only that
> > terminating LF is truncated.
> 
> I concur with both of the above paragraphs - my preference is to detect
> incomplete JSON texts at the sequence level via the missing LF rather than
> special-casing numbers and relying on failed JSON parses for everything else.
> In general, earlier detection of errors increases the options for dealing
> with them.

And, of course, a streaming/incremental parsers might well output all
there is to output when only the last LF is missing but the top-level
value was properly delimited anyways.  So it's kinda difficult to get a
fool-proof requirement that the trailing LF must be present.

Your review comments included adding this note about incremental
parsing.  There's a conflict here between the two comments that had not
been apparent to me last night.  I now think that fixing the ws problem
is the best way forward.

> Once the incomplete text is detected, a JSON parse could be attempted,
> with the JSON parser knowing that the text is incomplete (e.g., text
> may fail to parse, a number at the end of the text must not be produced
> as an incremental parse result).

That's so for non-incremental parsers.  (Or when buffering the complete
text instead of handling incrementally, even though one has an
incremental parser.)

Consider one implementation I'm familiar with.  Its JSON text parser is
incremental (but not streaming), so it produces outputs with no need for
extra whitespace when the input text is a string, array, or object, but
for top-level numbers, booleans, and null, it needs to either read one
more byte or reach EOF before it will output them.

So I think we really do need to say something about top-level numbers
(and true, false, and null), namely: that they must be delimited by
whitespace, that '<RS>1234<RS>' is not a valid sequence element because
the number may have been truncated.  (Ditto '<RS>true<RS>', since the
intended text could have been 'trueish', which is invalid of course, but
still.)

> As for RFC 20 ...
> 
> > Is this resolved by now?  I can always reference only Unicode.
> 
> Keep the RFC 20 reference - I have no problem with it.  Moreover, as a
> result of all the hubbub around this nit, the IESG has issued a Last Call
> to reclassify RFC 20 as an Internet Standard ... so that this never
> arises again ...

Yes, I noticed.  I expect the IETF LC will pass for that.

Thanks,

Nico
--