Bruce, I'll try to respond to the issues and questions you raise, but please note that the landscape here is strewn with dead horses and that kicking them is not a particularly helpful or rewarding activity. --On Tuesday, 08 February, 2005 14:54 -0500 Bruce Lilly <blilly@xxxxxxxxx> wrote: >> Date: 2005-02-08 08:39 >> From: John C Klensin <john-ietf@xxxxxxx> > >> Well, it is a little worse because there are tools that make >> detection of the YAH00.COM problem and its relatives pretty >> easy and those tools are widely understood. For example, >> forcing those domain names to lower case makes them very >> distinguishable (yahoo.com and yah00.com) are pretty clearly >> different) and using fonts that make zeros and "o"s, ones and >> "l"s, etc., clearly different helps a lot too. > > On the other hand, using lower case won't help if the > "attacker" uses Greek omicron instead of Latin 'O'. As I have said elsewhere, there are _many_ opportunities for confusion here. I assume that those who constructed this particular example wanted to use a well-known phishing target. The particular YAHOO example, as distinct from the paypal example, came from a comment on the IETF list. In both cases, better (i.e, more difficult to detect) examples are possible, especially with fewer or different constraints. But that isn't the point, is it? >> With IDNs, the simple fact that there are tens of thousands of >> characters with which one can try to create confusion, rather >> than 37 or so, means there are going to be more >> "opportunities". What is more important, perhaps, is that we >> just don't have the experience with the design of user >> interfaces that make problem detection easy. For example, >> the moment I touched the Firefox cursor to the examples at >> the examples at >> http://www.shmoo.com/idn/, I realized that I really wanted to >> see the punycode in the status line as well as the "native >> character" rendering. > > I assume that rather than "punycode" (which is an encoding > scheme used for *part* of IDNs) you mean the on-the-wire > dot-separated DNS name components consisting solely of > letters, digits, and hyphens. If so, I have two comments: > 1. That's not likely to help, as humans aren't very adept at > decoding IDNs on sight, and distinguishing one IDN from > another on sight isn't something that one would expect > casual users to be able to do; all IDNs tend to look like > "xn--blah", and many casual users lack any of concern, > interest, inclination, or patience to look beyond "xn". I think I said that, although in different language :-(. If one is expecting an ASCII string, then seeing a punycode label instead would be a strong tip-off that there is a problem. If one is expecting an IDN string, then seeing a punycode-label in the string that is presented would be a far less useful hint. > 2. That would defeat the intent behind IDN, which is to present > what the on-the-wire DNS name represents rather than that > on-the-wire DNS name. Let me try to say this carefully. The "intent behind IDN" is to permit people to use local languages and characters in what appears to them to be DNS labels. Until and unless every one of us has a keyboard that permits easy input of every Unicode character (and I don't mean by knowing and typing in it code point position) and the knowledge and character perception/discrimination ability needed to use such a magical keyboard, there will always be some likelihood of reversion to punycode -- not for the local language and characters, but for presentation-form FQDNs that contain characters from very far away. I hope that doesn't happen very often. I expect that it will happen less often as time goes on. But I don't expect we will reach zero, at least within the next decade or two. Whether or not you expect that, there is a huge difference between seeing native-character text in the display of the DNS name or URI/IRI on the web page --where I would hope to _never_ see punycode-- and what can optionally be turned on in a status line. But note that we are talking about user interface issues here, not standards. If a user wants that status line information, let her have it. If he doesn't, so be it. And, if a browser doesn't offer the needed flexibility to give users what they want, I presume that users who care enough will find other browsers. As another piece of this, my own guess --and I want to stress that it is just a guess, not a proposal for a standard or requirement-- is that whatever mechanism is used to copy DNS names or URIs from one place to another will acquire separate "copy native characters" and "map to punycode and copy that" options. I'd expect similar options for IRIs, i.e., "copy IRI" and "force into URI format with escaped characters and copy that". Why? Because, if I am a sensible and cautious user of Lower Slobbovian script and I'm sending an IDN or IRI on paper to a user who is not familiar with that script, I'm going to send the punycode or URI form along as a safety precaution. YMMD, of course, and you might plausibly prefer to let only people who know and can read and type your script get to your content. But we should both, IMO, have the options of doing whatever meets our needs. > I'd add that one approach to the problem would be to undo the > encoding, query DNS to get an IP address, then present that > (possibly with associated SOA information and reverse domain > name lookup); numeric IP addresses aren't going to be mistaken > for some random collection of "characters" (in the Unicode > sense) or non-numeric glyphs. In the discussion above, you made the observation that end users are not likely to be good at decoding punycode-containing IDNs on sight. We agree. Do you think those users are going to be better at looking at an IP address and figuring out if it belongs to whomever they think it belongs to? If your answer is "yes", does it change when you think about IPv6: much longer addresses, multiple addresses per host, etc. As you are thinking about this, note that the world's most popular operating system doesn't support a "dig" or "nslookup" function in most of its versions/ variations. Where it is possible to read the characters and type them back in, the easiest protection against this type of attack is extremely well-known from the ASCII-only world, and that is to type in the URI or IRI one thinks one sees, rather than clicking on a link. Now, realistically, no one is going to do that, especially with complicated URLs, unless they have reason to be suspicious (of course, the seriously security-paranoid _always_ have reason to be suspicious). Nor, again, absent suspicion, is anyone going to attempt reverse mappings or traceroutes on every domain name. But, again, let's take this up a level of abstraction and remember that we are talking about user interfaces -- an area in which IETF competence has been shown to be, well, limited. _Nothing_ is going to completely identify and accurately diagnose every possible case of phishing, fraud, misleading names, evil programs catching typographical errors, and so on. That statement is true whether we are talking about IDNs or about an IDN-free environment. What we should hope is that those who provide applications and user interfaces will provide their users and customers with a sufficient range of options to detect what can reasonably be detected, and create the right level of suspicion, with an acceptable level of ugliness, to produce warnings where appropriate, and to give users the appropriate tools for checking things out that seem dangerous. We can hope that the marketplace rewards applications and applications-writers who do a good job of that. Those of us who are a little bloody-minded will probably also hope that natural selection will appropriately reward those lusers who turn off all of the checking or select applications that don't have it because those applications provide a more elegant user experience. But nothing the IETF can or will do is going to help with any of that. > Regarding suggestions that some authority or authorities > should enact some restrictions intended to prevent such > misleading names; in the absence of a globally-recognized > and effective enforcement mechanism, such measures are > meaningless. And I would hasten to add that a Big > Brother-esque world that such things would lead to would > be highly undesirable (at least by those of us who have > no interest in being "Big Brother"). See "dead horse" above. The IETF decided to throw whatever parts of this it could even theoretically control over the wall and over the wall is probably where it belongs. However, you should be aware already that many, perhaps most, domains (at any level of the tree) have created and enforce the names they are willing to register and it has pretty much always been that way. Like the DNS, many of those decisions are extremely distributed: if you don't like the rules of one domain, you are free to find another one whose rules you do like, or to register something somewhere and then make up your own rules for its subdomains. Other preferences and restrictions get tied up with trademarks and enforced by lawyers and neither your religious convictions nor mine are likely to change that very much. IDNs, again, make some things more complicated. A number of entities have found that, for various reasons, rather aggressive registration restrictions, sometimes ones that bind groups of names together, are in the interests of the populations they serve -- that is what, e.g., RFC 3743 is about. Others haven't. You pay your money and you make your choices. >> Just as with the YAH00.COM case, no single measure is going to >> "fix" or prevent the various problems we can encounter with >> IDNs. But a combination of some thinking, good policies, >> adapting tools on the basis of experience, and the level of >> user vigilance that seems a requirement for being attached to >> the Internet at all these days ought to permit us to use IDNs >> at risk comparable to that for LDH-style ASCII names. > I suspect the problem is intractable, and is rooted in the > (IMO ill-conceived) conflation of public DNS "names" (meaning > keywords in the RFC 1958 / RFC 2277 sense) with natural > language / legal "names" (proper names, trademarks, etc.). > [And I agree with Ohta-san's statement that we are observing > the inevitable consequences; not only of internationalization, > but of the underlying conflation of protocol elements with > natural language names.] You don't need to convince me. See RFC 3467 and, to a lesser degree, RFC 3071. Or you might try to dig out a copy of draft-klensin-dns-search-06.txt, which I hope to find time to get back to some day. But the marketplace and, following rather than leading it, the IETF, made a different set of decisions. Much as I might have wished it otherwise, DNS names stopped being purely protocol elements the first time it occurred to someone to put a URL on the side of a bus or in an advertisement with a popular audience. That particular genie isn't going back in the bottle (again, much as some of us might wish otherwise) and no amount of revising statements are architecture is going to make any difference. > I would also like to take this opportunity to repeat an earlier > suggestion, viz. that the IAB should update RFC 1958 and give > that update some status more substantive than "Informational". > In particular, such an update should clearly state that > protocol elements are simply that; any resemblance to natural > language names, places, or things is purely coincidental. Sure. Who do you think would pay attention to such a statement? >> I can only hope that our colleagues at Mozilla will rapidly >> supercede their apparent advice to disable IDNs --advice that >> seems to me to be equivalent to "you should be happy just >> using English" > > I don't think that is the equivalent; letters, digits, and > hyphens are not peculiar to English, nor are domain name > components tied to any language -- they are simply protocol > elements that identify places in a hierarchical database > which maps to a database of values associated with a > hierarchical assemblage of such elements. > > IMO, advice to disable IDNs is good advice; no > "internationalization" of protocol elements was necessary in > the first place, and the mechanism -- like a number of other > mechanisms in URL syntax (e.g. user/password delimiters in the > "authority" section, %-encodings) which have long been used to > obfuscate or mislead -- leads to predictable consequences. I > note in passing that other browser suppliers have disabled > similar mechanisms because of concerns about the sort of issue > under discussion. Like it or not, there is a large population in the real world who are not interested in that argument or position. There are even folks who are technically sophisticated enough to understand and accept your argument about protocol identifiers who nonetheless believe that they should be able to identify objects with names or acronyms that have mnemonic value in their languages and character sets. Personally, I have a lot of trouble disagreeing with the latter group, and have learned that disagreeing with the former one doesn't get me anywhere. best, john _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf