Hello Dave, others,
On 03/03/2021 23:40, Dave Crocker wrote:
I'm finally able to get some time for this. And I'm finding myself
thinking of the interaction between ietf perspective and unicode
perspective. ietf perspective uses the term octet. I think there can
be some benefit in mixing the terms, to try to connect them, for the
reader.
This may be an okay 50'000 feet high summary, but is in no way
appropriate for an actual protocol spec. Also, there are many ietf specs
that use Unicode code points and many parts of Unicode that use the term
octet. The term octet is as appropriate e.g. for MTU or HTTP
Content-Length as it is for the result of encoding characters in UTF-8.
The term codepoit (or code point) is as appropriate e.g. in RFC 3987 as
it is somewhere in the Unicode spec.
"There may be some benefit of mixing the terms" sounds extremely vague,
and the text below turns out that way. The connection between these
terms has to be very precise.
Consequently, I propose:
On 3/2/2021 6:34 AM, Ricardo Signes wrote:
One:
The rule emoji_sequence is inherited from [Emoji-Seq]. It defines a
set of octet sequences, each of which forms a single pictograph.
I would replace "octet" with "code point". The referenced document
only describes sequences of code points. The encoding of those into
octets is orthogonal, and will be described by the content-type and
content-transfer-encoding jointly. So, I think this change is a
definite improvement to accuracy, and is worth making.
NEW:
<t>The ABNF rule emoji_sequence is inherited from <xref
target="Emoji-Seq"/>. It defines a set of octet sequences, each of which
forms a single pictograph.
Sorry, this is wrong. The ABNF rule in target Emoji-Seq does not define
a set of octet sequences. It defines a set of codepoint sequences, and
whether or how they end up as octet sequences is undefined in that document.
The BNF syntax used in [Emoji-Seq] differs
from <xref target="ABNF"/>, and MUST be interpreted as used in Unicode
documentation. The referenced document describes these as sequences of
code points.
So how do you get octets from those codepoints? You don't say. Please
just use the wording that Ned provided. That wording makes things clear.
Two:
Reference to unallocated code points SHOULD NOT be treated as an
error; the corresponding octets SHOULD be processed using the system
default method for denoting an unallocated or undisplayable code
point.
I suggest the same change. It's -maybe- more debatable. But this
I find myself wanting to retain octet here.
It would be okay to leave it as is if you make the linkage explicit in
the previous text by using the text provided by Ned.
Regards, Martin.
Again, it makes a linkable
between code point and octet explicit. Further, this text involves raw
data that can't be processed normally and octet has no semantics beyond
saying 8-bit, whereas code point invokes substantial semantics.
Thoughts?
d/
--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call