--On Monday, January 26, 2015 12:13 -0600 Nico Williams <nico@xxxxxxxxxxxxxxxx> wrote: > On Mon, Jan 26, 2015 at 07:35:42AM -0800, Asmus Freytag wrote: >> On 1/26/2015 1:12 AM, Nico Williams wrote: >> > As far as I'm concerned it's clear that the correct way to >> > handle these cases is: as confusables. Is this wrong? >> >> I basically agree with you. >> >> I'm making a further distinction between confusables by >> accident and confusables by intent, and am advocating that >> the latter can be handled more explicitly. But basically, yes. > > I'm not sure that the cause of the confusability makes any > difference. It's there. Once it's there we have to deal. I don't know if this is related to what Asmus was thinking, but I think there are two kinds of intent. One is intent in the Unicode coding, where a deliberate decision was made to assign different code points to glyphically-identical characters ("homographs" or "homoglyphs") within the same script. For that case, we, or at least UTC, presumably know what the characters are and could, at least in principle, make a list of them or assign a special property value to them. In the grand scheme of things, they should be very easy to identify even if what to do once they are identified might be controversial. For example, if the only tool we had was to ban one or another code point or combining sequence of a "confusable" pair, it may not be obvious which one to prohibit. Taking U+08A1 as an example because it is the case that started this, if one were not constrained by stability rules or the like, it would not be clear whether it would be better to allow it (because it obeys the rules that Asus summarized in yesterday afternoon's note and is more compact) or the combining sequence \u0628\u0654 (because it is more likely to be used/ expected/ keyed in by Arabic speakers or users of languages written in Perso-Arabic variations and there are many more people in those two groups than there are writers of Fula in Arabic script). The other sort of intent involves a would-be attacker deliberately trying to create confusion, to mislead the user, or to create distrust of the identifier system, in the IDNA case, the DNS or IDNs generally. There is nothing accidental about those cases and they are difficult precisely because none of the fine distinctions we are making about the differences among "the same glyph (grapheme cluster)", "the same (or different) abstract character", and "things that look alike under some set of circumstances that an attacker might be able to control or exploit". And then there are accidents, either of cross-script similarities or identities (because of historical copying, some may not really be accidental) or of user perception because of combinations of appearance of the characters and user perceptions. I think the first of these is (or should be) much easier to handle in a systematic way than the other two, but that, if we want internationalized identifiers, we'd better be able to do better with the others than trying to educate users to be really, really, careful, perhaps to the point of paranoia ... --On Monday, January 26, 2015 12:09 -0600 Nico Williams <nico@xxxxxxxxxxxxxxxx> wrote: >... >> Yes, indeed. Which is why, for years, this was a requirement >> of IDNA enablement in Firefox. Only the proliferation of >> registries put an end to our enforcement of that policy >> programmatically. We (or at least, I) now intend to enforce >> it via the media if there is ever a problem caused by a >> registry allowing one of its customers to attack another one >> by registering a homograph. > > Right, if a registry screws this up, their reputation has to > suffer. > > (The same goes for CAs, no? Though of course DNS has to come > first.) While I'm certainly in favor of shaming evildoers, keep two things in mind. First, while the number of distinct registry operators is much smaller, the number of TLDs may soon exceed the number of active CAs. The total number of zones and zone administrators probably deserves terms like "astronomical". Perhaps unlike the CA environment (or perhaps not), there is a fairly impressive history of registrars and retailers who are willing to delegate obviously-deceptive names if doing so improves their bottom line even slightly and who are quite happy to hide the names and contact information of their customers. If we don't do the best we can to control that situation, it more or less invites regulator intervention that could fragment the DNS namespace or worse. I certainly have days, and assume that Gerv does too, when I'd be delighted to see those regulators and their law enforcement associates show up. But history suggests that, when they do, they are likely to be heavy-handed enough to be very bad for the DNS and the Internet. john john > > The details of how a confusable came about are certainly > interesting, but they don't really matter to how we handle > them, right? > > Nico