Re: [Last-Call] Genart last call review of draft-crocker-inreply-react-07

Kjetil Torgrim Homme <kjetilho@xxxxxxxxxx> · Thu, 28 Jan 2021 09:21:23 +0100

On Wed, 2021-01-27 at 19:35 -0800, Dave Crocker wrote:
> On 1/27/2021 6:32 PM, Dale Worley via Datatracker wrote:
> >     The rule emoji_sequence is inherited from [Emoji-Seq].  It permits
> >     one or more bytes to form a single presentation image.
> > 
> > I haven't traced the definition of emoji_sequence, but it seems to be
> > essentially a set of Unicode characters that have one or another of
> > certain attributes.  That is perfectly sensible.  But if I understand
> > correctly, "emoji_sequence" is a sequence of characters, and you want
> > to say "In the UTF-8 encoding, some of these characters may be encoded
> > as multiple bytes." or something like that.
> 
> Sorry but I'm not understanding what clarity this provides, over the 
> existing text.
> 
> To the extent that your intent is to say that a) this is a subset of 
> UTF-8, and b) multiple bytes can be used, I think that's built into the 
> definition of emoji-sequence.
> 
> In fact, I had added the one or more text mostly to highlight the the 
> 'sequence' can be only one byte, since 'sequence' would be expected to 
> be read as meaning multiple.

One small change here which will reduce the amount of confusion is to
avoid the word "byte".  Indeed, it is *not* possible for the sequence
to be only one byte, since there are no Unicode code points in the
range U+0000 U+007F with the Emoji property set.

So, use "emoji characters" or "code points" instead?

(I tend to avoid the use of "byte" in favour of "octet" to forestall
complaints from the old DEC-10, DEC-20 and Cray users anyway ☺)

> >     Reference to unallocated code points SHOULD NOT be treated as an
> >     error; associated bytes SHOULD be processed using the system default
> >     method for denoting an unallocated or undisplayable code point.
> > 
> > Code points that do not have the requisite attributes to qualify as
> > part of an emoji_sequence should also not be treated as an error,
> > although you probably want to allow the system to alternatively
> > display them normally (rather than as an unallocated or undisplayable
> > code point).
> 
> I think your comment addresses a different issue than the cited text is 
> meant for, but I also might be misunderstanding.

Probably, but I think it bears saying something about how to handle
code points without the Emoji property set.  IMHO they should be
handled as undisplayable.

> For whatever reasons, including not having been allocated by the Unicode 
> folks, or possibly by running an older system that thinks a code point 
> is not allocated, there is an issue of how the system should deal with 
> encountering such a code point.  The text here is merely trying to say 
> "do whatever you do".
> 
> A different issue is encountering a code-point, here, that is outside of 
> the emoji-sequence set. The text doesn't try to tell the receiver how to 
> process bytes that are illegal here.

The above suggestion would still allow the implementer sufficient lee-
way.

-- 
venleg helsing,
Kjetil T.

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call