--On Friday, February 15, 2013 16:48 -0800 Joe Touch <touch@xxxxxxx> wrote: > If any label were allowed, then why does IDN conversion go so > far out of its way to exclude particular strings, e.g., those > beginning/ending with '-' and encodes everything 0..7F into > a-z/0-9? > > (I was focused on looking up A records given FQDNs) Now you are asking a different question, although the answer for A RRs generally is still the same. My apologies to those who don't need the mini-tutorial that follows -- many of those of us who work primarily in applications would need similar tutorials to understand the reasons for some of the decisions in the work of those who primarily operate at lower layers of the stack. I recommend rereading the relevant sections of RFCs 1123 and 2181 but, briefly, the DNS doesn't impose any limitations other than what will fit in an octet. However, many, perhaps most, applications do impose their own rules and those rules usually match what 1034/1035 call the "preferred syntax" -- a syntax derived from popular applications at the time as those specifications make clear. As one example with which I'm painfully familiar, SMTP treats a domain name containing characters outside the ASCII range as syntax violations and a conforming implementation will never look up such a domain as part of mail address resolution or routing. Consequently, something like Non-ASCII-String MX 0 some.domain.example. is perfectly valid as far as the DNS is concerned but nonsense as far as actual utility is concerned -- SMTP implementations are the only users of MX RRs and no conforming SMTP implementation will ever access such a record. IDNA is a clever trick (or, from other perspectives, an ugly hack) that accomplishes two main things: -- It permits IDNs to be used with applications including, e.g., SMTP, without changing _their_ syntax rules because the labels stored in the DNS and transmitted on the wire still conform to those "preferred syntax" rules. -- It warns applications and forces the additional restrictions and processing that enable sensible treatment of non-ASCII strings. As the most obvious example, the case-insensitive matching that the DNS specifies for ASCII strings is not defined by the DNS for non-ASCII ones (and, indeed, becomes more complicated and language or locale-sensitive for some characters). I don't believe the importance of the second was fully appreciated when we got started on IDNA. To this day, people who believe that IDNA can be replaced by simply placing Unicode strings encoded in UTF-8 into the DNS tend to make proposals that ignore those issues. The additional exclusions of IDNA such as prohibition of most symbols in the Unicode collection and restrictions on the appearance of "--" in the third and fourth octets of labels apply only to IDNA implementations and are intended to provide an extended, but still relatively safe, version of the historical "preferred syntax" and/or to protect the syntax for signaling special coding in the (unlikely but not impossible) event that a different one is needed for some purpose in the future. The bottom line is that none of these restrictions, including the SMTP one and the IDNA ones, are a property or requirement of the DNS. best, john