Re: Gen-ART LC review of draft-ietf-eai-utf8headers-09.txt

"Spencer Dawkins" <spencer@xxxxxxxxxxxxxxxxx> · Sun, 23 Mar 2008 17:23:43 -0500

Hi, Harald,

Thanks for the quick feedback (Gen-ART reviewers like this because we can 
remember writing the review, and at least part of what we were thinking 
about :-)

Looks like mostly goodness. If we're in synch, I dropped it from this 
e-mail.

Spencer

>> 1.2.  Relation to other standards
>>
>>   This document also updates [RFC2822] and MIME, and the fact that an
>>   experimental specification updates a standards-track spec means that
>>   people who participate in the experiment have to consider those
>>   standards updated.
>>
>> Process: The ID Tracker is showing this draft in Last Call status, but I
>> can't find (in the archive or in my personal folders) any Last Call
>> announcement, which I was looking for, in order to check how Chris 
>> explained
>> the downref at Last Call time - I'm expecting that it will be quite
>> entertaining. Has anyone else seen such an announcement on IETF Announce?
> Note: Intended status is Experimental.
>
> The subject line of the Last Call was
>
> Last Call: draft-ietf-eai-smtpext (SMTP extension for internationalized 
> email address) to Experimental RFC
>
> and covered 2 drafts; this may be why you did not find it.

Exactly right (I was scanning by subject). While I'm amazed that the downref 
isn't being called out in the Last Call announcement, I think RFC tracks and 
standards levels are so arbitrary that they are useless, so I'm not 
complaining - I was trying to figure out if there really had been a Last 
Call announcement sent, that's all.

>> 4.  Changes on Message Header Fields
>>
>>   This protocol does NOT change the definition of header field names.
>>
>> technical: I'm confused here. Is this text saying "does not change header
>> field names"? I would have thought this specification is exactly changing
>> the definition of header field names...
> It does not change the definition of header field NAMES (which remain 
> ASCII), but changes the definition of header field BODIES (which used to 
> be ASCII, but are now UTF-8).
>>
>>   That is, only the bodies of header fields are allowed to have UTF-8
>>   characters; the rules in [RFC2822] for header field names are not
>>   changed.
> And this sentence is saying that. How can we express this more clearly?

Ah. You filled in the missing piece for me here. Perhaps something like

"This protocol does NOT change the [RFC2822] rules for defining header field 
names. The bodies of header fields are allowed to contain UTF-8 characters, 
but the header field names themselves must contain ASCII characters."

>>   Interoperability considerations:  The media type provides
>>      functionality similar to the message/rfc822 content type for email
>>      messages with international email headers.  When there is a need
>>      to embed or return such content in another message, there is
>>      generally an option to use this media type and leave the content
>>      unchanged or downconvert the content to message/rfc822.  Both of
>>      these choices will interoperate with the installed base, but with
>>      different properties.  Systems unaware of international headers
>>      will typically treat a message/global body part as an unknown
>>      attachment, while they will understand the structure of a message/
>>      rfc822.  However, systems which understand message/global will
>>      provide functionality superior to the result of a down-conversion
>>      to message/rfc822.  The most interoperable choice depends on the
>>      deployed software.
>>
>> technical: not sure what the last sentence actually means. "We don't know
>> what the most interoperable choice will be"? Text in the same paragraph 
>> says
>> both choices are interoperable. If that text is correct, I don't 
>> understand
>> what you're saying here.
> Would it be better to say "the most useful choice"? It's likely to be the 
> difference between a compliant MUA offering to dump the message to a file 
> and displaying it as a message...

"The most useful choice" seems very reasonable. The current text seems to 
contradict other text in the same paragraph.

>> 5.  Security Considerations
>>
>>   Because UTF-8 often requires several octets to encode a single
>>   character, internationalized local parts may cause mail addresses to
>>   become longer.  As specified in [RFC2822], each line of characters
>>   MUST be no more 998 octets, excluding the CRLF.
>>
>> clarity: s/CRLF/CRLF, even when UTF-8 characters are being used/
>>
>>   Because internationalized local parts may cause email addresses to be
>>   longer, processes which parse, store, or handle email addresses or
>>   local parts must take extra care not to overflow buffers, truncate
>>   addresses, exceed storage allotments, or, when comparing, fail to use
>>   the entire length.
>>
>> technical: this is great advice, but I don't understand how UTF-8 changes
>> the situation. If you aren't changing the 998-octet requirement, software
>> that breaks for UTF-8 would also break for ASCII headers with the same 
>> octet
>> length.
> If someone uses another representation internally (for instance UTF-16), 
> and has a 998-character buffer, that will sometimes fit into 998 octets of 
> UTF-8, and sometimes not. The same goes in the other direction.... I'm 
> sure others will think of other cases.

Thanks for the clear explanation here. This is headed in the right 
direction - I wasn't impressed with guidance that says "take extra care", 
but saying "must accommodate 998 characters (which may require more than 998 
octets, depending on the character set in use), and must not overflow 
buffers, ..." seems clear enough to me.

> Hope this helped....

Extremely. Thanks for explaining, too.

Spencer 

_______________________________________________
IETF mailing list
IETF@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf