Case insensitive lookup is likely to be a similar problem regardless of whether you use "raw" UTF-8 or you encode things in 7-bit ASCII. To me the real issue, however, seems to be in the applications, i.e. in the resolvers (libresolv.*/gethostbyname()/gethostbyaddr()) and all the places where the resulting names are to be processed netdb.h: char *h_name; char **h_aliases"; It would be, er, hm, unwise to assume that all applications would do The Right Thing (TM) when the "char *" starts to carry UTF-8 data (which is 8-bit, unless ASCII equivalent). And, do remember that these strings can come to you without you asking for them, it's NOT www URLs only - just think of spam "From nobody@r�ksm�rg�s-reklam.se" (SE 8859-1, although nicely encoded in UTF-8, 8-bit). Now, the IT industri has been declining the last last months, but I doubt it would do good to create such massive amount of extra work. MIME QP - agreed it's a kludge, agreed it's uggly - did save us from a lot of problems (i.e. no Flag Day needed). Don't forget that. Gunnar Lindberg >From listadm@loki.ietf.org Fri Apr 5 22:21:17 2002 >Date: Fri, 05 Apr 2002 14:41:53 -0500 >From: John C Klensin <klensin@jck.com> >To: Robert Elz <kre@munnari.OZ.AU> >cc: ietf@IETF.ORG >Subject: Re: [idn] Re: 7 bits forever! >Message-ID: <9863660.1018017713@localhost> >In-Reply-To: <2228.1018021996@brandenburg.cs.mu.OZ.AU> >References: <2228.1018021996@brandenburg.cs.mu.OZ.AU> >--On Friday, 05 April, 2002 22:53 +0700 Robert Elz ><kre@munnari.OZ.AU> wrote: >> Date: Thu, 4 Apr 2002 09:50:01 -0800 (PST) >> From: "Gary E. Miller" <gem@rellim.com> >> Message-ID: >> <Pine.LNX.4.44.0204040931110.10828-100000@catbert.rellim.com> >> >> | Maybe it can, but that does not make it right. >> | >> | RFC 1035 "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION" >> | >> | 2.3.1 >> >> If you actually go read that section, carefully, instead of >> just quoting the part from it that everyone notices first, you >> will see that it says something quite different from what you >> think it does. >> >> You need to read the part of the section that appears on the >> preceding page of the formatted RFC... >> >> Or see (part of) rfc2181 for a longer verison of this. >Actually, having read that section, and several other sections, >_very_ carefully in recent months, I think 2181 is contradictory >at best, and possibly seriously wrong, on this point. >As I read them, what 1034 and 1035 say is that the DNS can >accomodate any octets, but that [at least then] >currently-specified RRs are restricted to ASCII. The LDH rule >is a good ("best"?) practices one. It is the LDH rule that RFC >1123 modified slightly. And it is quite correct to assert that >the LDH rule is not a _DNS_ requirement. >But the ASCII rule is a firm requirement. For evidence of this, >temporarily ignore the text (although, personally, I think it is >clear -- especially in 2.3.3-- if read carefully) and examine >the requirement that, for the defined RRs, labels and queries be >compared in a case-insensitive way. For ASCII, that is a >well-defined operation, one that can be performed by doing the >comparison under a bit mask. For other scripts, as the IDN WG >discovered, "case insensitive comparison" is typically not >completely well-defined, often involves complex tables and/or >knowledge of local context, and is sometimes quite controversial >as to what is intended. >So I believe that the "future RRs" language with regard to >binary labels in 1034 and 1035 must be taken seriously and as >normative text: if new RRs (or new classes) are defined, they >can be defined as binary and, hence, as not requiring >case-insensitive comparisons. Conversely, within the current >set (or at least the historical set at the time of 1034/1035), >case-insensitive comparison is required and hence binary must >not be permitted. >Any other reading, I believe, leads immediately either to >contradictions or to undefined states within the protocol. >As an aside, it appears to me that this requirement for >case-insensitive comparison is the real problem with "just put >UTF-8 in the DNS" approaches. An existing and conforming >implementation has no way to do those required case-insensitive >comparisons outside the ASCII range. Worse, if it does those >comparisons by bit-masking (which would be conforming today), >there is a risk of its getting rather bizarre errors (of either >matching or not matching) on characters outside the ASCII range. >One supposes that we could modify the protocol to specify that >case-insensitive comparisions be made only for octets in the >ASCII range, but, unless that were done through an EDNS option, >it would be a potentially fairly significant retroactive change. > john