--On Sunday, February 9, 2025 14:06 -0800 Rob Sayre <sayrer@xxxxxxxxx> wrote: > On Sun, Feb 9, 2025 at 10:20 AM John C Klensin > <john-ietf@xxxxxxx> wrote: > >> On the other hand, they are probably inappropriate as they stand. >> As a trivial example, private use code points are used, as the >> document (and the Unicode specs) indicate, by private agreements >> among cooperating parties. But different cooperating parties may >> have very different uses for them and use them differently, making >> them a threat to general interoperability. Therefore saying that >> they are not problematic and are reasonable for use in >> general-purpose Unicode subsets is, well, problematic. >> > > To use this example, I think the document does well here. It's fine > to allow private use code points in general protocols like HTTP. > How else are they to be sent? Think of free-text fields like this > message. I can understand the need for more restrictive > specifications in fields with narrower requirements (like a URL). > What I do agree with is that you probably don't want to be sending > around obsolete control characters unless you really mean it. Rob, With the caution that I think the review -- and the conclusion that the document is not ready for RFC publication on Standards Track (if at all) and that the proposed PRECIS profiles do not meet that specification's requirements -- stand on its own and that I am not going to have the time to respond to individual differences of opinion, our difference in opinion might be important and indicative of broader issues. As part of that, I should probably stress that, of the many issues I have with the document, the inclusion of private use code points is among the least important. That said... If a private agreement can be reached among some parties to use superficially valid Unicode code points in some special way that they (and probably only they) know how to to interpret and that they don't care how others might interpret (or even prefer that others cannot do so reliably), that is fine for them. That might be private use ones, nominally unassigned ones, C1 controls that the parties decided to give special meanings, etc. It would be equally fine if they, privately, agreed to use Unicrypt or Multicode -- whatever those might be-- instead of Unicode. If some intermediate system (like a browser or, if email were involved, an anti-spam tool) looked at the message and decided it violated some rule and should hence be discarded or accessible only after agreement to fierce warnings, perhaps that would be so much the better, even from the standpoint of those parties. Something of the same thing could be said about private agreements to use some non-standard variation on, e.g., IP packets. One could make almost the same argument for private agreements to deliberately break Unicode, PRECIS, or W3C prohibitions on, e.g., surrogate code point use -- those who were party to the agreement would know what was intended; everyone else would be confused about how to interpret the string and would probably not try. And, if two sets of parties use the same code points for different purposes, confusion would result and they might like that. >From a security standpoint and going back to our discussions about pervasive surveillance, such private agreements about character codings have an extra advantage, which is that using private-use characters (or non-characters, odd controls, unassigned code points, etc.) could be a way to get messages through the network that might be blocked or treated with extra suspicion by a government that believed it had the right (or obligation) to inspect traffic, especially traffic that was obviously encrypted using standardized tools. However, having an IETF Standards Track document say, essentially, that it is ok to use private use code points because some parties might have private agreements is, IMO, a problem or two. First, our specifications are supposed to be about interoperability, not about what some parties might agree on, privately, for their own shared use. However, even if this particular case justified ignoring that principle, either interoperability or security considerations (or both) would suggest that the issues, tradeoffs, risks of misinterpretation, etc., need to be discussed in the document, not just dismissed with what appears to be "use of private code points is ok if you have some sort of reason". best, john -- last-call mailing list -- last-call@xxxxxxxx To unsubscribe send an email to last-call-leave@xxxxxxxx