Re: thoughts and a possible solution on homograph attacks

James Youngman <james+yahoo@xxxxxxxxxxxxxxxxxxxxxxxx> · Mon, 7 Mar 2005 20:58:37 +0000

On Mon, Mar 07, 2005 at 06:25:31PM +0100, Michael Roitzsch wrote:
> Hi security community,
> 
> this is my first publication I post on Bugtraq, so please be patient with me.
> 
> Since the recent problems with IDN, I wanted to clear up my thoughts on 
> homograph attacks, so I sorted everything in an article which also contains 
> what I believe to be an easy and general solution.

Guts are :-

|| I propose to present the user with a dialog showing the text to be
|| validated and an input field, into which the user has to type in the
|| given text again. The user is told, if both texts match precisely and
|| what this means: If the typed text's internal representation matches
|| the given text bit-by-bit, trust can be established.  If it does not
|| match, the user is told to re-check for typing errors and not to
|| establish trust.

Problems with this approach:-

1. The user will see this as an irritation and won't percieve it as
   helping them keep their computer secure.  Hence they will want to
   turn the feature off.  People select usability over security unless
   they clearly understand the security problem and the usability
   difference is manageable.

2. The earlier description says that the quoted actions should occur
   "Whenever the user has to validate textual information to establish
   trust," but it is my belief that that even includes the case where
   the user starts a blank browser and then pastes a URL into the
   address box.

3. "matches the given text bit-by-bit" can give spurious negative
   results in many circumstances.  This particularly applies to those
   languages making use of Unicode "combining marks".  Combining
   characters are additional marks following a main character that are
   essentially "decorations" to it.  Usually the order of combining
   characters is significant, and the user would be able to see the
   difference between the orderings (and therefore will know what
   order to type their characters to get the same effect they see).
   However, there are also other types of combining characters that
   have the same rendering no matter what order they're presented in.
   This is very similar though to the original problem and it is not
   clear to me that domain names containing such combining characters
   should be allowed (otherwise there will be two alternative Unicode
   code point sequences that appear the same but are actually
   different domain names, as I understand things).

   I suspect that where a domain name contains a sequence of combining
   characters of differing combining classes, the right thing to do is
   to allow as a registered domain name only the renderings in which
   the marks are encoded in the "canonical order".  See section 3.11
   of the Unicode Standard, version 4.0.  Apologies if I have
   misunderstood this area of Unicode, it's a bit complicated and I
   don't have a history of immersion in Unicode.  The summary of my
   third point then is possibly that worrying about the possible
   differences of ordering of combining marks is probably the
   responsibility of whoever oversees the registration of the IDN, and
   probably isn't something we can be expected to solve in every piece
   of client software.

The length of this email is out of proportion to its usefulness.  Sorry.

James.