RE: Gen-ART review of draft-ietf-ipfix-text-adt-05

"Black, David" <david.black@xxxxxxx> · Wed, 28 May 2014 09:58:35 -0400

Brian,

Thanks for the response.

Regarding Unicode string normalization and preparation, if those were
appropriate, they should have been introduced in RFC 7011, and picked
up here by reference.  If there's an actual problem that requires
those techniques (unclear), a revision to RFC 7011 would be the right
place to address that problem, not this draft.

I would note that the sheer volume of monitoring data tends to require
automatic processing by recipients, and such tools are likely to compare
strings in monitoring data. 

I definitely agree that some text about Enclosing Context rules/requirements
for Unicode usage would help, e.g., along the lines of:

> We largely presume these to be the responsibility of the Enclosing Context (in
> this document) and the Metering Process that created the string in the first
> place. Neither 7011 nor this representation present additional considerations
> here: we just take the Unicode string we get from whoever wants to send/store
> it and shovel it out.

and

> > Lots of mischief is possible with non-printing and control characters -
> > I would expect that the Enclosing Context contains sufficient restrictions
> > on use of Unicode to deal with most of this concern, and would state that
> > expectation.  This comment is definitely specific to this draft.

So, please do add some text on this topic.

> > A general warning about unreliability of Unicode string comparison
> > is in order.  This also applies if an identifier that is not limited
> > to ASCII characters is substituted for an integer as described in
> > Section 4.2.
> 
> Good point; is there a good cite for this?

Try RFC 6885 - Stringprep Revision and Problem Statement
for the Preparation and Comparison of Internationalized Strings (PRECIS)

> > Section 4.1.5 of the précis framework draft warns against use of mixed-
> > direction Unicode strings, as "there is currently no widely accepted and
> > implemented solution for the processing and safe display of mixed-
> > direction strings."  That warning deserves repetition here.

Please repeat that warning, probably w/an informative reference to the
précis framework draft.

Thanks,
--David

> -----Original Message-----
> From: Brian Trammell [mailto:ietf@xxxxxxxxxxx]
> Sent: Wednesday, May 28, 2014 5:22 AM
> To: Black, David
> Cc: General Area Review Team (gen-art@xxxxxxxx); ipfix@xxxxxxxx; ietf@xxxxxxxx
> Subject: Re: Gen-ART review of draft-ietf-ipfix-text-adt-05
> 
> hi David,
> 
> Many thanks for the review. Comments and questions inline.
> 
> On 24 May 2014, at 04:10, Black, David <david.black@xxxxxxx> wrote:
> 
> > I am the assigned Gen-ART reviewer for this draft. For background on
> > Gen-ART, please see the FAQ at
> >
> > <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
> >
> > Please resolve these comments along with any other Last Call comments
> > you may receive.
> >
> > Document: draft-ietf-ipfix-text-adt-05
> > Reviewer: David L. Black
> > Review Date: May 23, 2014
> > IETF LC End Date: May 28, 2014
> >
> > Summary:  This draft is on the right track, but has open issues
> > 		described in the review.
> >
> > This is a relatively short draft defining textual representations of
> > IPFIX data elements.  It's clear and easy to read.
> >
> > I assume that all the ABNF has been checked.
> 
> Yep.
> 
> > The open issues involve use of Unicode.
> 
> This is unsurprising. :)
> 
> > Minor issues:
> >
> > Section 4.7 string
> >
> >   As Information Elements of the string type are simply UTF-8 encoded
> >   strings, they are represented directly, subject to the escaping and
> >   encoding rules of the Enclosing Context.
> >
> > There's nothing "simply" about use of UTF-8 encoded strings :-).
> 
> Well, no. But most of the string normalization nightmares with UTF-8 are also
> problems in the Enclosing Contexts as well, so the assumption here is that if
> you're already representing strings in, say, UTF-8 encoded JSON, you're
> already paying the UTF-8 tax, so this document presents no additional
> considerations.
> 
> There is an additional point here, though, that the Enclosing Context may
> prefer or require a different encoding than UTF-8, so we should address that:
> 
> 	As Information Elements of the string type are simply Unicode
> 	strings (encoded as UTF-8 when appearing in Data Sets in IPFIX
> 	Messages [RFC 7011]), they are represented directly, using the
> 	Unicode encoding rules and quoting and escaping rules of the
> 	Enclosing Context.
> 
> > There appear to be no restrictions on Unicode codepoint usage and no
> > requirements for string normalization or other preparation either in this
> > draft or RFC 7011.
> 
> We largely presume these to be the responsibility of the Enclosing Context (in
> this document) and the Metering Process that created the string in the first
> place. Neither 7011 nor this representation present additional considerations
> here: we just take the Unicode string we get from whoever wants to send/store
> it and shovel it out.
> 
> I'm not sure it makes sense to enforce normalization in this context.
> 
> >  This can be a formula for all sorts of mischief, so
> > some warnings about what's possible should be added somewhere - some of
> > these comments may be raising Unicode concerns in RFC 7011 that would
> > be better addressed there.
> 
> Generally, text-adt is intended to allow the use of the IPFIX Information
> Model separately from the protocol described in RFC 7011, so any concern there
> should also be raised here.
> 
> > A general warning about unreliability of Unicode string comparison
> > is in order.  This also applies if an identifier that is not limited
> > to ASCII characters is substituted for an integer as described in
> > Section 4.2.
> 
> Good point; is there a good cite for this?
> 
> >  In addition, the concerns around visually similar
> > characters discussed in section 10.5 of the précis framework draft
> > (draft-ietf-précis-framework) apply; a short summary and pointer
> > to that section of that draft should suffice.
> >
> > Section 4.1.5 of the précis framework draft warns against use of mixed-
> > direction Unicode strings, as "there is currently no widely accepted and
> > implemented solution for the processing and safe display of mixed-
> > direction strings."  That warning deserves repetition here.
> >
> > Lots of mischief is possible with non-printing and control characters -
> > I would expect that the Enclosing Context contains sufficient restrictions
> > on use of Unicode to deal with most of this concern, and would state that
> > expectation.  This comment is definitely specific to this draft.
> >
> > Nits/editorial comments:
> >
> > Section 4.4 float32 and float64
> >
> >   exponent = ( "e" / "E" ) [sign] 1*3DIGIT
> >
> > Please explain why no more than 3 digits are ever required.
> 
> Will do (i.e., because the maximum ranges on the type range from O(1e-3xx) -
> O(1e3xx)).
> 
> > Section 4.8 dateTime*
> >
> > The '*' in the section title, dateTime* is clever, but it's meaning is not
> > obvious.  I suggest "The dateTime Data Types" as a better section title.
> 
> Yep, thanks.
> 
> > Section 5 Security Considerations
> >
> >   The security considerations for the IPFIX Protocol [RFC7011] apply;
> >   this document presents no additional security considerations.
> >
> > That's ok, although adding a direct mention of the [UTF8-EXPLOIT] TR
> > cited in RFC 7011 would be helpful.
> 
> Will do.
> 
> > idnits 2.13.01 warns that the JSON reference (RFC 4627) is obsolete, and
> > needs to be replaced with one or two current RFC references.
> 
> Oops; will replace.
> 
> Thanks again, cheers,
> 
> Brian