RE: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

"Black, David" <david.black@xxxxxxx> · Tue, 9 Dec 2014 16:41:12 +0000

Nico,

Thanks for the comprehensive response.

I've excerpted a few things for further comment - I'm fine with the
proposed actions for anything not mentioned here.

[A] JSON text parse failures

> >    If parsing of such an octet string as a JSON text fails, and the
> >    octet string is followed by an RS octet, the parser
> >    SHOULD nonetheless skip ahead to that RS octet and continue parsing
> >    the remainder of the sequence from there.
> 
> That's a weird way of saying that, wherever the JSON text parse fails,
> the sequence parser should then seek forward until the next RS-valued
> byte.  But it works for me; I'll take it.

Your alternative wording "whenever the JSON text parse fails, ..." is fine.

[D] Truncation

> A missing terminating LF is not a problem for strings, arrays, or
> objects.  I seem to recall that we did discuss this.  We could require
> that such texts fail to parse, but perhaps the more important thing is
> to require common parser behavior as to such truncations.
> 
> You ABNF proposal is certainly more strict than the one in the I-D.  I'm
> neutral as to whether this form or the one in the I-D (with the ws issue
> fixed) is better.  The stricter form is clearly easier to talk about,
> therefore preferable, but it will mean discarding texts where only that
> terminating LF is truncated.

I concur with both of the above paragraphs - my preference is to detect
incomplete JSON texts at the sequence level via the missing LF rather than
special-casing numbers and relying on failed JSON parses for everything else.
In general, earlier detection of errors increases the options for dealing
with them.

Once the incomplete text is detected, a JSON parse could be attempted,
with the JSON parser knowing that the text is incomplete (e.g., text
may fail to parse, a number at the end of the text must not be produced
as an incremental parse result).

I'll defer to the WG on whether/how to expand the decoder ABNF to better
detect incomplete texts and how to deal with any incomplete texts that are
detected.

As for RFC 20 ...

> > Nits/editorial comments:
> >
> > idnits didn't like the reference to RFC 20 for ASCII:
> >
> >   ** Downref: Normative reference to an Unknown state RFC: RFC   20
> 
> Is this resolved by now?  I can always reference only Unicode.

Keep the RFC 20 reference - I have no problem with it.  Moreover, as a
result of all the hubbub around this nit, the IESG has issued a Last Call
to reclassify RFC 20 as an Internet Standard ... so that this never
arises again ...

http://www.ietf.org/mail-archive/web/ietf-announce/current/msg13546.html

Thanks,
--David

> -----Original Message-----
> From: Nico Williams [mailto:nico@xxxxxxxxxxxxxxxx]
> Sent: Monday, December 08, 2014 11:20 PM
> To: Black, David
> Cc: General Area Review Team (gen-art@xxxxxxxx); ops-dir@xxxxxxxx;
> ietf@xxxxxxxx; json@xxxxxxxx
> Subject: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
> 
> On Fri, Dec 05, 2014 at 02:51:04PM +0000, Black, David wrote:
> > This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows
> ...
> 
> Thanks you.
> 
> > Minor issues:
> >
> > [A] Section 2.1:
> >
> >    If parsing of such an octet string as a JSON text fails, the parser
> >    SHOULD nonetheless continue parsing the remainder of the sequence.
> >
> > That's not quite right - there are two levels of parsing, JSON
> > sequence parsing and JSON text parsing of each text in the sequence,
> > both of which might be implemented in a single-pass parser.  For such an
> > implementation, the above sentence could be (mis-)read to imply that the
> > JSON text parse should resume from the point at which it failed, which
> > would be silly (although I've seen heroic PL/1 parsers do exactly that).
> > Instead, the parse needs to skip ahead to the next RS, ignoring the rest
> > of the JSON text that failed to parse.  I suggest:
> 
> Good point.
> 
> >    If parsing of such an octet string as a JSON text fails, and the
> >    octet string is followed by an RS octet, the parser
> >    SHOULD nonetheless skip ahead to that RS octet and continue parsing
> >    the remainder of the sequence from there.
> 
> That's a weird way of saying that, wherever the JSON text parse fails,
> the sequence parser should then seek forward until the next RS-valued
> byte.  But it works for me; I'll take it.
> 
> > [B] Section 2.3:
> >
> > Is incremental parsing of a JSON text within a sequence allowed, or
> > is the parser required to not produce any results until the parse of
> > the entire text is successful?  I'd expect that incremental parsing
> > is ok (so results may be produced from a text that ultimately fails
> > to parse), and I think that's worth stating.
> 
> It's permitted, of course, with the caveat that all incremental parsers
> have: the parse can ultimately fail.  I'll add this note:
> 
>    Incremental JSON text parsers may be used, though of course failure
>    to parse a given text may result after first producing some
>    incremental parse results.
> 
> to section 2.3.
> 
> > [C] Section 2.4:
> >
> >    Parsers MUST check that any JSON texts that are a top-level number
> >    include JSON whitespace ("ws" ABNF rule from [RFC7159]) after the
> >    number, otherwise the JSON-text may have been truncated.
> >
> > That reference to the "ws" rule doesn't get the job done because that
> > rule allows a match to no characters - it's of the form ws = *( ... )
> > where ... is the list of whitespace characters.  What's needed here is
> > a rule of the form vws = 1*( ...) to force there to be at least one
> > whitespace character, but see the next issue for a better way to deal
> > with this topic by pulling the appended LF into the sequence parse
> > instead of the text parse.
> 
> I'd wanted to not have to list the characters that are considered
> whitespace.  I agree that the "ws" rule is not appropriate though.
> 
> > [D] I wonder whether the possibility of incomplete texts ought to be
> > encoded into the parsing rules to directly catch JSON texts that must
> > be incomplete because the last character is not LF, e.g.:
> 
> A missing terminating LF is not a problem for strings, arrays, or
> objects.  I seem to recall that we did discuss this.  We could require
> that such texts fail to parse, but perhaps the more important thing is
> to require common parser behavior as to such truncations.
> 
> You ABNF proposal is certainly more strict than the one in the I-D.  I'm
> neutral as to whether this form or the one in the I-D (with the ws issue
> fixed) is better.  The stricter form is clearly easier to talk about,
> therefore preferable, but it will mean discarding texts where only that
> terminating LF is truncated.
> 
> One problem we get into with your ABNF is that RFC5234 does not discuss
> greediness (this came up on the list).  But this can be noted (see
> below).
> 
> One nice thing about the stricter rules is that we need not discuss
> top-level numbers (or booleans, or null) with normative text, just note
> them.
> 
> >      JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON))
> >      RS = %x1E; "record separator" (RS), see RFC20
> >      possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded
> >                                ; JSON text (see RFC7159)
> >      truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt
> > 					; to parse as JSON text
> >      empty-JSON = LF ; only the LF appended by the encoder, nothing to parse
> >
> >      not-RS = %x00-1D / %x1F-FF; any octet other than RS
> >      not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF
> >
> > Note that this won't detect all incomplete JSON texts, because LF is allowed
> > within a JSON text (and this should be stated).
> 
> It will if ABNF matching is greedy and possible-JSON is defined as
> 
>    possible-JSON = 1*( 1*(not-RS) LF); ...
> 
> Advice as to which form to take?
> 
> > [E] Section 3 - Security Considerations
> >
> > Incomplete and malformed JSON texts can be used to attack JSON parsers -
> > that should be pointed out, as I don't see that in RFC 7159's security
> > considerations and incomplete texts are a relevant consideration for
> > this draft.
> 
> And surely also for RFC7159.  Besides requiring that they fail to parse
> (and the note about incremental parsing), are there any other missing
> considerations?  Ah yes, smuggling of data in sequences where parsers
> will ignore without warning any octet strings that fail to parse as JSON
> texts.
> 
> Proposed text:
> 
>    As usual, parsers must operate on as-good-as untrusted input. This
>    means that parsers must fail gracefully in the face of malicious
>    inputs. Note that incremental parsers can produce partial results and
>    later indicate failure to parse the remainder of a text. Note that
>    texts that fail to parse and are ignored can be used to smuggle data
>    past sequence parsers that don't warn about JSON text failures.
> 
> > [F] Section 4 - IANA Considerations
> >
> >    Security considerations: See <this document, once published>,
> >    Section 3.
> >
> >    Interoperability considerations: Described herein.
> >
> >    Published specification: <this document, once published>.
> >
> >    Applications that use this media type: <by publication time
> >    <https://stedolan.github.io/jq> is likely to support this format>.
> >
> > Replace all three instances of the angle bracketed text.  The first two
> > instances should be RFC references (e.g., RFC XXXX) w/a note to the RFC
> > Editor to insert the number of the RFC when published.  The third instance
> > should be resolved now, or could have an RFC Editor note added indicating
> > that the author will resolve that during Authors 48 hours.
> 
> Sure.
> 
> > Nits/editorial comments:
> >
> > idnits didn't like the reference to RFC 20 for ASCII:
> >
> >   ** Downref: Normative reference to an Unknown state RFC: RFC   20
> 
> Is this resolved by now?  I can always reference only Unicode.
> 
> Thanks for your excellent review,
> 
> Nico
> --