Re: [Last-Call] Genart last call review of draft-crocker-inreply-react-07

worley@xxxxxxxxxxx (Dale R. Worley) · Sun, 31 Jan 2021 17:16:48 -0500

Dave Crocker <dcrocker@xxxxxxxx> writes:
> On 1/27/2021 6:32 PM, Dale Worley via Datatracker wrote:
>> Reviewer: Dale Worley
>> Review result: Ready with Nits

First to deal with the straightfoward points:

>>     The emoji(s) express a recipient's summary reaction to the specific
>>     message referenced by the accompanying In-Reply-To header field.
>>     [Mail-Fmt].
>>
>> This is not specific as to where the In-Reply-To header is.  I assume
>> you want to say that it is a header of the parent multipart component
>> of "Reaction" part.  Or perhaps this should be forward-referenced to
>> the discussion in section 3.
>
> I don't understand the concern.  An In-Reply-To header field is part of 
> the message header.  That is, it will be in the header of the response 
> message.

Given that we're deailing with multipart messages, an In-Reply-To header
could be stuck in the message header but it could also be stuck in the
headers of any part.  I don't know if it's ever done, but certainly,
it's plausible that if I include a reply which I had received as an
attachment to another email I send, the In-Reply-To header in the
received e-mail would show up as a header to the attachment part, not
my message as a whole.

In general, the situation is one of unlimited complexity.

I'm not particular what rules you want to specify, just that when I'm
looking at a part with this Content-Disposition that is somewhere in a
multipart structure (possibly without parts), that it's clear which sets
of headers I need to examine to find the In-Reply-Header.

Now I think in reality, it either has to be in the headers of the part
with disposition "reaction", or in the multipart containing that part.
But whatever the rule is, it should be stated.

>>     Reference to unallocated code points SHOULD NOT be treated as an
>>     error; associated bytes SHOULD be processed using the system default
>>     method for denoting an unallocated or undisplayable code point.
>>
>> Code points that do not have the requisite attributes to qualify as
>> part of an emoji_sequence should also not be treated as an error,
>> although you probably want to allow the system to alternatively
>> display them normally (rather than as an unallocated or undisplayable
>> code point).
>
> I think your comment addresses a different issue than the cited text is 
> meant for, but I also might be misunderstanding.
>
> For whatever reasons, including not having been allocated by the Unicode 
> folks, or possibly by running an older system that thinks a code point 
> is not allocated, there is an issue of how the system should deal with 
> encountering such a code point.  The text here is merely trying to say 
> "do whatever you do".

The text is a constraint, though.  It *requires* (sort of) that if the
bytes in the part form a character which the receiver considers
unallocated, it *should not* reject the whole message as being
ill-formed.  The implementation has great freedom in how to display the
caracter, but the message as a whole "SHOULD NOT be treated as an
error".

> A different issue is encountering a code-point, here, that is outside of 
> the emoji-sequence set. The text doesn't try to tell the receiver how to 
> process bytes that are illegal here.

Perhaps that is what you intend, and if so, the text is correct.  But it
seems to me that if the bytes form a code point that the receiver
considers to be allocated but not an emoji, it should be under the same
constraint that it should not reject the message as a whole as erroneous.

Now for the messy part:

>     The rule emoji_sequence is inherited from [Emoji-Seq].  It permits
>     one or more bytes to form a single presentation image.

First, let me say I keep a rigid category distinction between
bytes/octets and characters.  And in this situation, it seems like there
are *three* layers of composition between bytes and displayed items:

- The UTF-8 encoding groups bytes into code points, which are generally
  Unicode "characters".

- The code points can be composed (by Unicode rules) into characters.
  As Barry explains, "as creating “á” from “a” plus combining acute
  accent".  But I'm not so familiar with how that is done and how that
  affects exactly what the word "character" means.  (I also do not know
  whether any emoji code point participates in Unicode composition, but
  a sender can certainly compose reactions containing code points that
  participate in composition, and there probably is no guarantee that
  Unicode will never do such a thing with emoji.)

- Groups of characters may be displayed as single images.  As Barry
  explains, "the sort of thing that’s unique to emoji, wherein the
  emojis for man followed by woman followed by boy, each of which is a
  separate emoji character that would be displayed as it seems, will
  often be rendered as a single image of a family".

Composing these processes, it takes bytes/octets (the encoded form of
the "reaction" part) into a sequence of displayed images.

When I wrote my review, I was aware only of the first composition layer.
But now, it's not clear to me what the sentence "It permits one or more
bytes to form a single presentation image." is intended to say.  The
combining of bytes to form an image may happen at any of the three
layers, and it seems to me that the entire process would be better
described as "It permits one or more bytes to form one or more
presentation images."  But maybe you're trying to say something more
specific.

Dale

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call