Re: [Last-Call] New Version Notification for draft-crocker-inreply-react-07.txt

"Ricardo Signes" <rjbs@semiotic.systems> · Tue, 02 Mar 2021 09:34:46 -0500

On Tue, Mar 2, 2021, at 9:15 AM, John C Klensin wrote:
I don't know whose concern was to make that particular switch
and why, but my concern about either (and, I'm guessing,
Martin's) is that almost all Unicode code points (those outside
the ASCII range) require more than one octet to represent in any
encoding scheme.  For UTF-8, which the I-D requires, the number
of octets is variable.  So using "octet" as a unit of --well,
much of anything--is, at best, confusing.

"octet" appears in two places.

One:

   The rule emoji_sequence is inherited from [Emoji-Seq].  It defines a
   set of octet sequences, each of which forms a single pictograph.

I would replace "octet" with "code point".  The referenced document only describes sequences of code points.  The encoding of those into octets is orthogonal, and will be described by the content-type and content-transfer-encoding jointly.  So, I think this change is a definite improvement to accuracy, and is worth making.

Two: 

   Reference to unallocated code points SHOULD NOT be treated as an
   error; the corresponding octets SHOULD be processed using the system
   default method for denoting an unallocated or undisplayable code
   point.

I suggest the same change.  It's -maybe- more debatable.  But this document is describing what to do with the decoded content, because it doesn't describe anything about C-T-E or charset decoding.  We must assume that the decoding layer has done its job and now we either have a total error or a codepoint sequence.  (Some decode layers will have been instructed to hand back REPLACEMENT CHARACTER when the octet sequence was mangled, which will not be a valid emoji sequence, and everything works out.)

-- 
rjbs
-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call