John,
You are absolutely right. I believe there is a current discussion within the Mozilla community to flag out IDN. There would certainly help but would not help with 'sex.com' (as one guy noted on circleid) if 'sex' is all written in Cyrillic.
I am also concern about creating new protocols to say 'look up valid characters' within an IDN. Nice idea but would increase a lot of overhead to application developers and also to the network.
I taken a look at your name munging. I think we did an implementation of a similar idea back when I was in i-DNS except we do it using a modified DNS. what it does is you can query a (utf).foobar.tld and the query will response CNAME (ACE).tld or something like that. Of course, it is a private implementation and the idea need more work. But i think it might be useful to look at DNS rather then inventing new protocol. (Okay, for those who is afraid to load on more stuff on DNS, we can also consider SOAP).
ps: Sorry for the duplicates. Fast fingers :P
-James Seng
On 09-Feb-05, at AM 03:41, John C Klensin wrote:
James,
At one level, you are clearly correct, and several other people have made the same observation today and over the last four years. Certainly, as my note and others indicated even before you posted yours, this is old news. That fact doesn't change things much, even if it has some limited "I told you so" value. Obviously we could quibble about whether the warning about implementation warnings in RFC 3490, or the IESG statement, or the ICANN guidelines, separately or together, go far enough or are precise enough about what is needed to be useful. But we have had that particular discussion several times and repeating it here (or elsewhere) has ceased to be entertaining and is probably not useful.
It is perhaps worth noting that the 3490 suggestion that "visual indications where a domain name contains mixed scripts" may be problematic as a recommendation for actual implementations: although inverting them would probably help, the tables for which characters fall into which of the huge number of scripts in the world represent far more baggage for an implementation to carry around than the nameprep/stringprep tables required by IDNA.
Similarly, for an application to check the characters/ scripts supported by a particular ccTLD, as suggested by one of the contributors to your blog, is problematic. One would really not like to carry around a list of ccTLDs and their permitted characters in an application, first because that could be a very large set of tables, second because the lists change as ccTLD policies evolve and tables in deployed applications are notoriously hard to update, and third because it does nothing for gTLDs. One could design a bit of protocol that would permit querying the ccTLD for the accepted character list (or a protocol similar to that described in draft-klensin-name-munging-03.txt could be trivially extended), but that would add overhead and raise issues about authoritative and authenticated lists and do nothing for third level subdomains and below. One might, more plausibly, think about a new DNS RR whose data field would contain the permitted character list for a given zone, perhaps with some provision for ranges (which would perhaps keep the DNS record from being overwhelmed by CJK), but no one has even started thinking that through carefully as to whether it would be either feasible or useful in practice.
For the application to identify whether or not something is an IDN (another suggestion that has been made several times) is also not likely to be especially useful except in the short term: homographic problems can be demonstrated as easily between pairs of non-Roman scripts as between ASCII names and some other script, with the combination of Greek and Cyrillic being one example that is particularly rich with "opportunities".
I actually think the mini-flap about this example is helpful. Sure, it is old news to the specialists (including everyone who participated in the IDN WG and paid attention and everyone who has read 3490 and the IESG note carefully). But, despite that specialist knowledge, the issue has not gotten the attention of the community: within the last 24 hours, users and registrants have been surprised, a few have even been horrified, and that is probably a good thing. Applications have been implemented that support IDNA without any of the "maybe this is bogus" warnings that we know how to provide, and think should be provided, and perhaps this will get the attention of those implementers too. And, without bashing any particular offender --despite the implication in your blog, the one you single out is not unique and it is not useful in any event-- perhaps this will help convince registries who have been inclined toward either "IDNA with no further restrictions" or "very broad ranges of characters, intermixed" policies that the recommendations to restrict labels to specific languages and/or scripts really are important from a customer protection standpoint and that they have some responsibility in that area.
Regards, john
p.s. Jefsey, the only relevant WSIS message here is that blindly going off and enthusiastically implementing and deploying IDNs (or IRIs), without consideration of these issues that have been well-understood for years, puts users at risk to the point of being irresponsible. That doesn't imply in any way that IDNs are bad, only that one needs to consider the systems of which they are a part, do registrations and implementations with appropriate safeguards and user education, and that handwaving is not a sufficient way to address the issues. And, again, that is very old news -- to the extent that, if the WSIS process and contributors don't understand it, it is a problem with their processes.
--On Tuesday, 08 February, 2005 23:21 +0800 James Seng <james@xxxxxxx> wrote:
For the 5th time today, it is already documented in RFC 3490.
http://james.seng.cc/archives/2005/02/08/idn_and_homographs_sp oofing.html
JFC (Jefsey) Morfin wrote:May be IDN specialists will want to comment this. http://www.shmoo.com/idn/homograph.txt Is this exact? This is urgent as the IRI is based upon IDN and support of multilingualism is a WSIS priority and comments for the WGIG are to close the day after tomorrow. Thank you. jfc
_______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf
_______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf