Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

Nico Williams <nico@xxxxxxxxxxxxxxxx> · Mon, 8 Dec 2014 22:19:42 -0600

On Fri, Dec 05, 2014 at 02:51:04PM +0000, Black, David wrote:
> This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows ...

Thanks you.

> Minor issues:
> 
> [A] Section 2.1:
> 
>    If parsing of such an octet string as a JSON text fails, the parser
>    SHOULD nonetheless continue parsing the remainder of the sequence.
> 
> That's not quite right - there are two levels of parsing, JSON
> sequence parsing and JSON text parsing of each text in the sequence,
> both of which might be implemented in a single-pass parser.  For such an
> implementation, the above sentence could be (mis-)read to imply that the
> JSON text parse should resume from the point at which it failed, which
> would be silly (although I've seen heroic PL/1 parsers do exactly that).
> Instead, the parse needs to skip ahead to the next RS, ignoring the rest
> of the JSON text that failed to parse.  I suggest:

Good point.

>    If parsing of such an octet string as a JSON text fails, and the
>    octet string is followed by an RS octet, the parser
>    SHOULD nonetheless skip ahead to that RS octet and continue parsing
>    the remainder of the sequence from there.

That's a weird way of saying that, wherever the JSON text parse fails,
the sequence parser should then seek forward until the next RS-valued
byte.  But it works for me; I'll take it.

> [B] Section 2.3:
> 
> Is incremental parsing of a JSON text within a sequence allowed, or
> is the parser required to not produce any results until the parse of
> the entire text is successful?  I'd expect that incremental parsing
> is ok (so results may be produced from a text that ultimately fails
> to parse), and I think that's worth stating.

It's permitted, of course, with the caveat that all incremental parsers
have: the parse can ultimately fail.  I'll add this note:

   Incremental JSON text parsers may be used, though of course failure
   to parse a given text may result after first producing some
   incremental parse results.

to section 2.3.

> [C] Section 2.4:
> 
>    Parsers MUST check that any JSON texts that are a top-level number
>    include JSON whitespace ("ws" ABNF rule from [RFC7159]) after the
>    number, otherwise the JSON-text may have been truncated.
> 
> That reference to the "ws" rule doesn't get the job done because that
> rule allows a match to no characters - it's of the form ws = *( ... )
> where ... is the list of whitespace characters.  What's needed here is
> a rule of the form vws = 1*( ...) to force there to be at least one
> whitespace character, but see the next issue for a better way to deal
> with this topic by pulling the appended LF into the sequence parse
> instead of the text parse.

I'd wanted to not have to list the characters that are considered
whitespace.  I agree that the "ws" rule is not appropriate though.

> [D] I wonder whether the possibility of incomplete texts ought to be
> encoded into the parsing rules to directly catch JSON texts that must
> be incomplete because the last character is not LF, e.g.:

A missing terminating LF is not a problem for strings, arrays, or
objects.  I seem to recall that we did discuss this.  We could require
that such texts fail to parse, but perhaps the more important thing is
to require common parser behavior as to such truncations.

You ABNF proposal is certainly more strict than the one in the I-D.  I'm
neutral as to whether this form or the one in the I-D (with the ws issue
fixed) is better.  The stricter form is clearly easier to talk about,
therefore preferable, but it will mean discarding texts where only that
terminating LF is truncated.

One problem we get into with your ABNF is that RFC5234 does not discuss
greediness (this came up on the list).  But this can be noted (see
below).

One nice thing about the stricter rules is that we need not discuss
top-level numbers (or booleans, or null) with normative text, just note
them.

>      JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON))
>      RS = %x1E; "record separator" (RS), see RFC20
>      possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded
>                                ; JSON text (see RFC7159)
>      truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt
> 					; to parse as JSON text
>      empty-JSON = LF ; only the LF appended by the encoder, nothing to parse
> 
>      not-RS = %x00-1D / %x1F-FF; any octet other than RS
>      not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF
> 
> Note that this won't detect all incomplete JSON texts, because LF is allowed
> within a JSON text (and this should be stated).

It will if ABNF matching is greedy and possible-JSON is defined as

   possible-JSON = 1*( 1*(not-RS) LF); ...

Advice as to which form to take?

> [E] Section 3 - Security Considerations
> 
> Incomplete and malformed JSON texts can be used to attack JSON parsers -
> that should be pointed out, as I don't see that in RFC 7159's security
> considerations and incomplete texts are a relevant consideration for
> this draft.

And surely also for RFC7159.  Besides requiring that they fail to parse
(and the note about incremental parsing), are there any other missing
considerations?  Ah yes, smuggling of data in sequences where parsers
will ignore without warning any octet strings that fail to parse as JSON
texts.

Proposed text:

   As usual, parsers must operate on as-good-as untrusted input. This
   means that parsers must fail gracefully in the face of malicious
   inputs. Note that incremental parsers can produce partial results and
   later indicate failure to parse the remainder of a text. Note that
   texts that fail to parse and are ignored can be used to smuggle data
   past sequence parsers that don't warn about JSON text failures.

> [F] Section 4 - IANA Considerations
> 
>    Security considerations: See <this document, once published>,
>    Section 3.
> 
>    Interoperability considerations: Described herein.
> 
>    Published specification: <this document, once published>.
> 
>    Applications that use this media type: <by publication time
>    <https://stedolan.github.io/jq> is likely to support this format>.
> 
> Replace all three instances of the angle bracketed text.  The first two
> instances should be RFC references (e.g., RFC XXXX) w/a note to the RFC
> Editor to insert the number of the RFC when published.  The third instance
> should be resolved now, or could have an RFC Editor note added indicating
> that the author will resolve that during Authors 48 hours.

Sure.

> Nits/editorial comments:
> 
> idnits didn't like the reference to RFC 20 for ASCII:
> 
>   ** Downref: Normative reference to an Unknown state RFC: RFC   20

Is this resolved by now?  I can always reference only Unicode.

Thanks for your excellent review,

Nico
--