--On Tuesday, 04 January, 2005 09:38 -0500 Bruce Lilly <blilly@xxxxxxxxx> wrote: >> One is not. Domain names are strings of characters; only >> incidentally do they spell out one or more words in one or >> more languages. I doubt whether the names "Google," "Yahoo," >> and "AltaVista" can be pinned down as belonging to one >> specific language. > > I was referring specifically to internationalized domain names > (IDN, RFCs 3490, 3491, 3492, 3743) where the on-the-wire > domain name continues to be of traditional form (ANSI X3.4 > letters,digits, and hyphen (with restrictions on combinations > and placement)), but where a certain class of names (those > beginning with "xn--") are "internationalized" and might be > presented to users in a different form (which can include > non-ASCII characters). That came about because of the > tendency to associate a domain name (tag) with a natural > language "name" or legally-registered name (trademark, etc.). > Whether one considers such associations logical or > irrational, that is what has happened. So one could have > a domain name (beginning with xn--) that is presented by > an application as "Nestlé.com". Now certainly some names, > such as your examples, Kodak, Häagen-Dazs, etc. have no > language (because they are made-up strings of characters), > but others do have a specific language. In skimming through > the RFCs mentioned above, it appears that there is now some > provision for language tagging (which was not present in > earlier versions of IDN). However, I have not thoroughly > reviewed those recent additions; therefore it should be > clear that I have not reviewed the impact of the proposed > draft changes on IDN or vice versa. Such a review should > take place (ideally before the deadline for the New Last > Call on draft-phillips-langtags-08 (tomorrow!)), but I'm > not the person to do so as I have only slight interest in > IDN (I'm one of those who considers associating a tag > with natural language and/or legally registered names to > be irrational). One potential issue is that domain names > are case-insensitive, and whether lower-case accented > characters map to/compare with unaccented upper-case > letters may be a function of language (or culture, or > political fiat). >... > I would add that there is apparently some discussion of > wreaking similar havoc on local-parts, which appear in > message-identifiers and email mailbox identifiers (STD 11). > That too should be evaluated w.r.t. specification of > language and the proposed changes. Bruce, While I'm sympathetic to many of the points you have raised, the IDN situation is not an issue except in a very narrow sense and similar situation would apply to local-parts if we ever do something there. In the IDN case, the protocols are written in terms of arbitrary Unicode strings and just about have to be -- there has never been a DNS restriction requiring that the labels be names or words in a language. The protocols apply some mapping rules that reject a few characters (and hence the labels that contain them) and change some characters into others, but the net effect is still a set of standards written in terms of strings, not languages. There has been a good deal of concern in the DNS community about the potential for deliberately or accidentially misleading users about domain names and the consequent opportunities for confusion or outright fraud. Those concerns have led to a good deal of work on restrictions about what strings can be registered, imposing, e.g., rules that the holder of one string may be the only permitted holder of a related one and rules that prohibit mixing scripts within a single label. These types of rules, especially the latter, are the "very narrow sense" mentioned above, but they have no impact on the protocols themselves. The registration rules actually differ from zone to zone and can safely do so because, to the user of the DNS, an unregistered name is an unregistered name and the distinction as to whether a name is unregistered because no one wanted it or because some subtle rule prohibited its registration is not of importance. The situation with local-parts will, most of us are convinced, work out in much the same way. There is a long history of strings used in local-parts that are not "names", "words", or otherwise bound to a particular language. Worse, different destination systems apply different internal syntax rules and interpretations to local-part strings. Protocols will need to be designed to reflect that history and avoid unreasonable restrictions. At the same time, I would expect the administrators of an given local system to impose restrictions on what local-parts parts can be used for mailboxes there (just as is often done today). Those restrictions may, in many cases, reflect assumptions about languages and/or scripts but, since they are purely local conventions, there is no need for external registration. Returning to the DNS/IDN situation, ICANN has created a recommendation for all TLDs, and a requirement on at least some gTLDs, that languages not be mixed within a label and for registration and use of tables similar to those recommended by RFC 3743. Those tables are identified by a combination of the Domain name associated with the registering TLD registry and a 3066 code. That system is not, IMO, working especially well and the 3066 code model will, I think, have to be extended to deal with some unusual situations. But, interestingly, draft-phillips... doesn't appear to solve that particular problem: what is needed is a way to specify odd mixtures of languages and/or scripts that may be appropriate to a particular zone, and that means less specificity and more linguistically-strange constructions, not more specificity and structure. john _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf