Hi. A few brief comments on this document.... (1) A mailing list for discussion is not designated. While I would normally suggested rfc-interest, the document appears to be written on the assumption that approval of this proposal rests with the IETF (presumably via the IESG) and IAOC and not with the RFC Editor (with presumed review by the IAB), so I am sending to this list. (2) The document seems to assume that availability of UTF-8 systems (or other systems based on Unicode with easy transcoding) is now near-ubiquitous. Actual experience, especially with documents being transmitted between computers by email and similar means, appears to be different. While I look forward to the day at which comprehensive UTF-8 support is universally available, at least as an interchange format, I do not believe that we are there yet... and that there is still a considerable gap, especially among systems that, instead of being ASCII-only, have been developed with a focused on either ISO 8859-1 or on a national coding system in East Asia. (3) The document indicates that display systems that cannot properly handle UTF-8 usually display an incorrect character from which the user can make inferences. While that sometimes happens --sometimes with considerable information loss as we have seen with a common anomaly with quoted sections of email when the system on which the response is being composed and those on both sides do not all prefer UTF-8-- it is at least equally common to see an "undisplayable character" indication which is the same for all such characters, e.g., a small box or question-mark. The problem is less likely with RFCs than with random email, but we do, often, quote from RFCs and I-Ds in email messages while working on them. Once those "undisplayable character" indicators are transferred from one system to another, information is irretrievably lost... finding "better display software" (rarely a realistic choice) is not an option for recovering that information. (4) Permitting critical information in RFCs (including any information that is considered normative and author contact information) to be exclusively in non-ASCII UTF-8 creates the possibilities that a would-be implementer may not be able to interpret the document or that it will be impossible to contact the author(s), especially if, as an anti-spam precaution, authors supply postal addresses and not email ones. (5) I think we could quibble at great length about the advice that should be given about compatibility characters. While it is probably sensible to discourage their use, it is quite easy to imagine cases in which they might be important if a string was going to be represented correctly. Those cases specifically include correct spelling of author names in some parts of the world and examples that, for one reason or another, actually have to illustrate the role of those characters... and author names and examples appear to be the main justifications for this proposal. On the other hand, as the authors point out, the issues with input methods and display of compatibility characters are often much more serious than they are with their equivalents, especially when display routines start performing character substitutions for characters for which they lack precise and accurate display capability. I suggest that the authors concentrate less on painting a rosy picture of how widely UTF-8 is deployed and how easily the problems can be overcome (e.g., "just get better display software [, even if that requires replacing hardware and operating systems ]"), and, instead, concentrate on a definition that would provide reasonable and effective fallbacks when things go wrong as, at least for the present, they certainly will. For example, permitting UTF-8 (with arbitrary non-ASCII characters) by itself in contact information is not sensible for the reasons given above, but permitting UTF-8 only with a requirement for either ASCII transliteration (or equivalent) or RFC 5137 encoding to be present as an alternative might be perfectly sensible at the current level of UTF-8 deployment and availability. Similar comments would apply to references, especially normative ones (the principle that the IETF operates in English and that English, and only English, is needed to understand its technical specifications goes well beyond the question of UTF-8 in RFCs and this document does not appear to intend to change it), and to at least some examples that were necessary to understand the normative text. --john _______________________________________________ Ietf@xxxxxxxx https://www.ietf.org/mailman/listinfo/ietf