Re: [Last-Call] New Version Notification for draft-crocker-inreply-react-07.txt

Martin J. Dürst <duerst@xxxxxxxxxxxxxxx> · Thu, 4 Mar 2021 20:47:12 +0900

Hello Dave, others,

On 03/03/2021 23:40, Dave Crocker wrote:
I'm finally able to get some time for this.  And I'm finding myself 
thinking of the interaction between ietf perspective and unicode 
perspective.  ietf perspective uses the term octet.  I think there can 
be some benefit in mixing the terms, to try to connect them, for the 
reader.

This may be an okay 50'000 feet high summary, but is in no way 
appropriate for an actual protocol spec. Also, there are many ietf specs 
that use Unicode code points and many parts of Unicode that use the term 
octet. The term octet is as appropriate e.g. for MTU or HTTP 
Content-Length as it is for the result of encoding characters in UTF-8. 
The term codepoit (or code point) is as appropriate e.g. in RFC 3987 as 
it is somewhere in the Unicode spec.

"There may be some benefit of mixing the terms" sounds extremely vague, 
and the text below turns out that way. The connection between these 
terms has to be very precise.

Consequently, I propose:

On 3/2/2021 6:34 AM, Ricardo Signes wrote:
One:

    The rule emoji_sequence is inherited from [Emoji-Seq].  It defines a
    set of octet sequences, each of which forms a single pictograph.

I would replace "octet" with "code point".  The referenced document 
only describes sequences of code points.  The encoding of those into 
octets is orthogonal, and will be described by the content-type and 
content-transfer-encoding jointly.  So, I think this change is a 
definite improvement to accuracy, and is worth making.

NEW:

<t>The ABNF rule emoji_sequence is inherited from <xref 
target="Emoji-Seq"/>. It defines a set of octet sequences, each of which 
forms a single pictograph.

Sorry, this is wrong. The ABNF rule in target Emoji-Seq does not define 
a set of octet sequences. It defines a set of codepoint sequences, and 
whether or how they end up as octet sequences is undefined in that document.

The BNF syntax used in [Emoji-Seq] differs 
from <xref target="ABNF"/>, and MUST be interpreted as used in Unicode 
documentation. The referenced document describes these as sequences of 
code points.

So how do you get octets from those codepoints? You don't say. Please 
just use the wording that Ned provided. That wording makes things clear.

Two:

    Reference to unallocated code points SHOULD NOT be treated as an
    error; the corresponding octets SHOULD be processed using the system
    default method for denoting an unallocated or undisplayable code
    point.

I suggest the same change.  It's -maybe- more debatable.  But this 

I find myself wanting to retain octet here.

It would be okay to leave it as is if you make the linkage explicit in 
the previous text by using the text provided by Ned.

Regards,   Martin.

Again, it makes a linkable 
between code point and octet explicit.  Further, this text involves raw 
data that can't be processed normally and octet has no semantics beyond 
saying 8-bit, whereas code point invokes substantial semantics.

Thoughts?

d/

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call