Keith Moore wrote: > First, it's naive to assume that UTF-8 will be the native > representation on everybody's platform Clarification. I did not say anything about UTF-8 on platforms, but instead cited native representation in protocol messages. However, if UTF-8 is the encoding of choice for a particular protocol's data or message formats, then we know that UTF-8 is also going to be incorporated into the necessary supporting functions at the participating end-points. This doesn't mean that the whole box has to use UTF-8. Frankly, I don't even think that's relevant. Instead, it is a question of whether or not the related components (searching, as in the previous example) will be likely to deal with UTF-8, rather than having to selectively graft an exraneous encoding into select portions of that service in order to provide simple functionality (as with 2047 and searching, again). But this is also entirely irrelevant. By your argument that the transfer encoding is irrelevant, I would like to hear your arguments as to how, say, using EBCDIC to pass ASCII data around could possibly be seen as reasonable design. Of course the native encodings are always best. The fact that most of the apps are heading towards UTF-8 should tell us that we should be designing for a long-term support infrastructure that provides the data in the format it is going to be used in. Furthermore, whenever the remaining services get upgraded or replaced, they should be able to use something a little better than the best technology that 1968 money can buy. > Second, the portion of IDNA that does ASCII encoding is such a trivial > bit of code that the number of failures introduced by that code will > pale in comparison to those introduced by the other code needed to > handle 10646 (normalization, etc) which would be needed no matter what > encoding were used. Getting new problems in addition to shared problems is hardly an argument in your favor. You've already conceded that 2047 has some problems with transliteration goofiness, and that restricting it to unstructured data limits the real damage that is caused. Are we to believe that extending structured data with mandatory transliteration will not cause the problems you thankfully avoided? > Numerous examples demonstrate that transition issues are often > paramount in determining whether a new technology can succeed. I agree that transitional services are important. I also think that the evidence shows that end-station layering works well when existing formats are used as carriers for *subsets*, and when it is targeted to a specific usage. That isn't what's being done here, though. Instead, well-known and commmonly-used data-types will get *extended* into broader and incompatible forms by default, and it will happen purposefully and accidentally. This is not transitional probing, it is knowing that stuff will break and doing it anyway. Cripes, why do we have to do it all in a big-bang? Can't we start with the transfer encoding (no required upgrades for anything), incrementally add transliteration where we know it will be safe and robust (some upgrades), and then add UTF-8 for those newer services that can make use of it (some more upgrades)? What is the problem with this? > Simplicity is often a virtue, but IDN is inherently complex - it > reflects the tremendous variety in the world's languages and > writing systems. And blind faith in some vague notion of > cleanliness is a poor substitute for engineering analysis. That's almost a fair shot. I do put a bunch of faith into transparent data-types and structures. Dunno about "blind". ASCII is always best when it's encoded as ASCII, after all. > reliability. But the need to allow incremental upgrade of > legacy application components strongly compels IDNA, and the > incremental benefit of a native UTF-8 query interface beyond > that of IDNA does not appear to justify the additional complexity. The complexity required for a direct UTF-8 name-resolution service in conjunction with simple passthru-everywhere is minor in comparison to the complexity of transliterate-everywhere. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/