RE: Last Call: draft-klensin-unicode-escapes (ASCII Escaping ofUnicode Characters) to BCP

Peter Constable <petercon@xxxxxxxxxxxxx> · Mon, 22 Oct 2007 08:08:51 -0700

From: Stephane Bortzmeyer [mailto:bortzmeyer@xxxxxx]
Sent: Monday, October 22, 2007 4:03 AM

>> Also, "a further encoding of the encoding form" isn't going to be
>> clear to readers.
>
> It is a reference to a bad practice (used in URLs, for instance) to
> encode twice (for instance in UTF-8, then in %xx escapes of the
> bytes).

The discussion in that section is about references to characters in general human-readable content, not in URLs. If that is what the wording is referring to, it's extremely opaque. If that's really what the authors intend to talk about, it should be explained -- and the section should be organized better so that it makes sense why that particular thing is being discussed.

>>   "However, when information about characters is to be processed by
>>   people, reference to the Unicode code point is preferable to
>>   encoded representations of the code point."
>
> That's not more clear to me.

How can it not be clear? Human-readable content is discussing a Unicode character and needs to refer to the character in some way. The whole point of this document is about how to refer. Since Unicode character identity is established by the name, the code point and the reference glyph, reference can be made using one of those three things. It appears to me that this document focuses on references based in some way on the code point: is not the key distinction between the code point itself and some encoded representation of the code point?

Peter Constable

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf