--On Friday, February 3, 2017 14:51 -0500 Viktor Dukhovni <ietf-dane@xxxxxxxxxxxx> wrote: >... > Is that right? Thus the verifier would sometimes need to > convert from U-labels to A-labels (when the localpart is all > ASCII), and at other times from A-labels to U-labels (when the > localpart is not all ASCII)... Victor, I think there is another issue hidden behind this that is worth mentioning and that interacts with your concern above. While it may or may not be important for any given protocol in the abstract (it will be for some, but not others), using strings containing non-ASCII characters in ways that interface with users is always going to involve tricky issues that require understanding and understanding, not just plugging code points, possibly with an encoding specified, into slots. People who "just" want to be told what to do so they don't need to think about it, or who want to apply a package they don't understand, are going to sooner or later find themselves or their users in trouble, whether the issues are identified as security problems, matching/equivalence errors of various kinds, user confusion due to violation of the law of least astonishment, or something else. The underlying issues are the result of the wide and very rich diversity of human writing systems and languages -- systems that are diverse enough that almost any simple statement or rule one can come up with will have exceptions. In general, that diversity is something we should celebrate rather than trying to find quick fixes or tricks to get around, only partially because those fixes or tricks aren't going to work well for some group of people. Narrow views of the situation just lead to other traps. In particular, while useful lessons can be learned, one cannot extrapolate from knowledge or experience of Latin-based scripts (even if one knows more than one language that uses them differently) to all others, from very closely related scripts (e.g., Greek-Latin-Cyrillic, some subsets of Indic (or neo-Brahmi) scripts, or so-called CJK) to writing system outside those groups without missing important cases and causing problems elsewhere. Like the kinds of diversity we deal with in some other areas, the differences did not show up overnight. A large fraction of the human population has been cr4ating and practicing them for centuries and, in many cases, tens of centuries. If IDNA enters the mix, another layer of knowledge and understanding is required. It is actually easier to grasp (or grok) than the above, but may have even greater impact in protocol design. Unlike the above, IDNA is artificial and a recent invention to solve a very specific problem with incremental deployment, a decision most of us, including most of those in the IDN business and those who use non-Latin scripts on a daily basis, think was probably a good idea. People simply need to understand how it works and is intended to evolve, with the U-label <-> A-label symmetry and checking requirements as particularly important. In particular for this case, protocols which reach the user simply need to be ready to handle U-labels and A-labels interchangeably. Because of the combinatorial explosion problem, trying to do that by enumerating the possible FQDNs just won't work -- people may know what, e.g., they intend to push out in email or use in the text of an HTML "a" element, but there are going to be just too many things in the network toat will change things back and forth for their own (perfectly rational) reasons. At least IMO, that makes a very strong argument for protocols, where it is possible, defining and using a single canonical form and expecting user interfaces to do conversions as needed. You may reasonably disagree with that last conclusion because it is just a protocol design preference, but most of the rest almost certainly must to be treated as immutable facts, at least until and unless we all agree to use the same language and the same orthography and writing system for that language. best, john