On Mon, Mar 07, 2005 at 06:25:31PM +0100, Michael Roitzsch wrote: > Hi security community, > > this is my first publication I post on Bugtraq, so please be patient with me. > > Since the recent problems with IDN, I wanted to clear up my thoughts on > homograph attacks, so I sorted everything in an article which also contains > what I believe to be an easy and general solution. Guts are :- || I propose to present the user with a dialog showing the text to be || validated and an input field, into which the user has to type in the || given text again. The user is told, if both texts match precisely and || what this means: If the typed text's internal representation matches || the given text bit-by-bit, trust can be established. If it does not || match, the user is told to re-check for typing errors and not to || establish trust. Problems with this approach:- 1. The user will see this as an irritation and won't percieve it as helping them keep their computer secure. Hence they will want to turn the feature off. People select usability over security unless they clearly understand the security problem and the usability difference is manageable. 2. The earlier description says that the quoted actions should occur "Whenever the user has to validate textual information to establish trust," but it is my belief that that even includes the case where the user starts a blank browser and then pastes a URL into the address box. 3. "matches the given text bit-by-bit" can give spurious negative results in many circumstances. This particularly applies to those languages making use of Unicode "combining marks". Combining characters are additional marks following a main character that are essentially "decorations" to it. Usually the order of combining characters is significant, and the user would be able to see the difference between the orderings (and therefore will know what order to type their characters to get the same effect they see). However, there are also other types of combining characters that have the same rendering no matter what order they're presented in. This is very similar though to the original problem and it is not clear to me that domain names containing such combining characters should be allowed (otherwise there will be two alternative Unicode code point sequences that appear the same but are actually different domain names, as I understand things). I suspect that where a domain name contains a sequence of combining characters of differing combining classes, the right thing to do is to allow as a registered domain name only the renderings in which the marks are encoded in the "canonical order". See section 3.11 of the Unicode Standard, version 4.0. Apologies if I have misunderstood this area of Unicode, it's a bit complicated and I don't have a history of immersion in Unicode. The summary of my third point then is possibly that worrying about the possible differences of ordering of combining marks is probably the responsibility of whoever oversees the registration of the IDN, and probably isn't something we can be expected to solve in every piece of client software. The length of this email is out of proportion to its usefulness. Sorry. James.