[Last-Call] Re: Last Call: <draft-bray-unichars-10.txt> (Unicode Character Repertoire Subsets) to Proposed Standard

John C Klensin <john-ietf@xxxxxxx> · Sun, 09 Feb 2025 18:50:10 -0500

--On Sunday, February 9, 2025 14:06 -0800 Rob Sayre
<sayrer@xxxxxxxxx> wrote:

> On Sun, Feb 9, 2025 at 10:20 AM John C Klensin
> <john-ietf@xxxxxxx> wrote:
> 
>> On the other hand, they are probably inappropriate as they stand.
>> As a trivial example, private use code points are used, as the
>> document (and the Unicode specs) indicate, by private agreements
>> among cooperating parties.  But different cooperating parties may
>> have very different uses for them and use them differently, making
>> them a threat to general interoperability.  Therefore saying that
>> they are not problematic and are reasonable for use in
>> general-purpose Unicode subsets is, well, problematic.
>> 
> 
> To use this example, I think the document does well here. It's fine
> to allow private use code points in general protocols like HTTP.
> How else are they to be sent? Think of free-text fields like this
> message. I can understand the need for more restrictive
> specifications in fields with narrower requirements (like a URL).
> What I do agree with is that you probably don't want to be sending
> around obsolete control characters unless you really mean it.

Rob,

With the caution that I think the review -- and the conclusion that
the document is not ready for RFC publication on Standards Track (if
at all) and that the proposed PRECIS profiles do not meet that
specification's requirements -- stand on its own and that I am not
going to have the time to respond to individual differences of
opinion, our difference in opinion might be important and indicative
of broader issues.

As part of that, I should probably stress that, of the many issues I
have with the document, the inclusion of private use code points is
among the least important.  That said...

If a private agreement can be reached among some parties to use
superficially valid Unicode code points in some special way that they
(and probably only they) know how to to interpret and that they don't
care how others might interpret (or even prefer that others cannot do
so reliably), that is fine for them.  That might be private use ones,
nominally unassigned ones, C1 controls that the parties decided to
give special meanings, etc.    It would be equally fine if they,
privately, agreed to use Unicrypt or Multicode -- whatever those
might be-- instead of Unicode.  If some intermediate system (like a
browser or, if email were involved, an anti-spam tool) looked at the
message and decided it violated some rule and should hence be
discarded or accessible only after agreement to fierce warnings,
perhaps that would be so much the better, even from the standpoint of
those parties.  Something of the same thing could be said about
private agreements to use some non-standard variation on, e.g., IP
packets.

One could make almost the same argument for private agreements to
deliberately break Unicode, PRECIS, or W3C prohibitions on, e.g.,
surrogate code point use -- those who were party to the agreement
would know what was intended; everyone else would be confused about
how to interpret the string and would probably not try.  And, if two
sets of parties use the same code points for different purposes,
confusion would result and they might like that.

>From a security standpoint and going back to our discussions about
pervasive surveillance, such private agreements about character
codings have an extra advantage, which is that using private-use
characters (or non-characters, odd controls, unassigned code points,
etc.) could be a way to get messages through the network that might
be blocked or treated with extra suspicion by a government that
believed it had the right (or obligation) to inspect traffic,
especially traffic that was obviously encrypted using standardized
tools.

However, having an IETF Standards Track document say, essentially,
that it is ok to use private use code points because some parties
might have private agreements is, IMO, a problem or two.  First, our
specifications are supposed to be about interoperability, not about
what some parties might agree on, privately, for their own shared
use.  However, even if this particular case justified ignoring that
principle, either interoperability or security considerations (or
both) would suggest that the issues, tradeoffs, risks of
misinterpretation, etc., need to be discussed in the document, not
just dismissed with what appears to be "use of private code points is
ok if you have some sort of reason".

best,
    john

-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx