Re: IDN security violation? Please comment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>  Date: 2005-02-08 19:57
>  From: John C Klensin <john-ietf@xxxxxxx>

> I'll try to respond to the issues and questions you raise, but
> please note that the landscape here is strewn with dead horses
> and that kicking them is not a particularly helpful or rewarding
> activity.

Noted. Ditto for hand-wringing.

> In both cases,
> better (i.e, more difficult to detect) examples are possible,
> especially with fewer or different constraints. ÂBut that isn't
> the point, is it?

My point was that case conversion doesn't go very far as a
means of detecting such things.

> If
> one is expecting an ASCII string, then seeing a punycode label
> instead would be a strong tip-off that there is a problem.

I'm still not sure that I understand exactly what you mean by
"a punycode label"; is that the DNS label (comprised of LDH),
or some hexadecimal codes, or something else?  [LDH characters
are (a subset of) ASCII...]

> Let me try to say this carefully. ÂThe "intent behind IDN" is to
> permit people to use local languages and characters in what
> appears to them to be DNS labels.

OK, I'll go around that minefield for the moment.

> Until and unless every one of 
> us has a keyboard that permits easy input of every Unicode
> character (and I don't mean by knowing and typing in it code
> point position) and the knowledge and character
> perception/discrimination ability needed to use such a magical
> keyboard [...]

A couple of observations:

1. I have in mind a keyboard on a certain device which has
   support for protocols which use domain names (HTTP, SMTP/
   Internet Message Format, VPIM).  It has a keyboard which
   is at best inconvenient for entry of ASCII text. Unicode
   "text" (see below for an explanation of the scare quotes)
   is unthinkable.  That device is a cell phone.  I have in
   mind another device with a keyboard (a PDA). It also has
   support for protocols which use domain names (all of the
   above plus VNC, FTP, TELNET, SSH, and probably a few
   others that I don't recall). The keyboard has no question
   mark key or escape key, and no convenient way to enter
   those characters short of menus etc. in specific
   applications.  Unicode, likewise, is unthinkable.  I am
   once again reminded of RFC 1958 (section 3.1); clearly
   somebody has lost sight of the issues discussed therein --
   huge Unicode equivalence/normalization/whatever tables
   simply won't fit in some devices.
2. Once upon a time, Unicode had Design Principles; I quote
   from Table 2.1 as it appeared in early Unicode Versions:
   "Sixteen-bit character codes | Unicode characters have a
   width of 16 bits."
   "Plain text | The Unicode Standard encodes plain text."
   The accompanying text went on: "Graphologies unrelated to
   text, such as musical and dance notations, are outside the
   scope of the Unicode Standard."  All of which sounded
   promising.  Well, those design principles have long been
   abandoned.  More recent versions of Unicode have added --
   you guessed it -- musical notations, etc.   Unicode
   adhering to the early design principles might have had a
   chance of fitting into small, low-power, mobile devices.
   But with expansion of the code points by several orders
   of magnitude that's impractical.  Not to mention the
   problems with incompatible versions (and I'm not referring
   to "the Korean mess" of RFCs 2279/2781).

> if I am a sensible and cautious user of
> Lower Slobbovian script and I'm sending an IDN or IRI on paper
> to a user who is not familiar with that script, I'm going to
> send the punycode or URI form along as a safety precaution.
> YMMD, of course, and you might plausibly prefer to let only
> people who know and can read and type your script get to your
> content.

Or adhere to the design principles mentioned in RFC 2396
section 1.5.

> > I'd add that one approach to the problem would be to undo the
> > encoding, query DNS to get an IP address, then present that
> > (possibly with associated SOA information and reverse domain
> > name lookup); numeric IP addresses aren't going to be mistaken
> > for some random collection of "characters" (in the Unicode
> > sense) or non-numeric glyphs.
> 
> In the discussion above, you made the observation that end users
> are not likely to be good at decoding punycode-containing IDNs
> on sight. Â We agree. Â Do you think those users are going to be
> better at looking at an IP address and figuring out if it
> belongs to whomever they think it belongs to?

No, which is why I mentioned SOA information (reverse lookup
of the IP to name mapping alone may not work in some cases
(DHCP, etc.) and won't help in others "yah00.com" -> IP ->
"yah00.com" doesn't help much).  On the other hand, if SOA
information indicates that "yah00.com" is registered to
somebody in China, that's a big indication that something is
fishy.  Of course. registrars will need to be more vigilant
about ensuring that SOA information, whois records, etc. are
correct [and, yes, I am aware that some people intentionally
provide falsified information].

> As you are
> thinking about this, note that the world's most popular
> operating system doesn't support a "dig" or "nslookup" function
> in most of its versions/ variations. [...]

The idea is that the application (e.g. browser) would do the
lookups and display (e.g. in a status area, or perhaps something
like the way some browsers display certificate/cookie information)
the relevant information.

> However, you
> should be aware already that many, perhaps most, domains (at any
> level of the tree) have created and enforce the names they are
> willing to register
[...]
> if you don't like the rules of one domain, you are free to [...]
> make up your own rules for its subdomains.

And therein lies the gaping loophole in such schemes.

> > I would also like to take this opportunity to repeat an earlier
> > suggestion, viz. that the IAB should update RFC 1958 and give
> > that update some status more substantive than "Informational".
> > In particular, such an update should clearly state that
> > protocol elements are simply that; any resemblance to natural
> > language names, places, or things is purely coincidental.
> 
> Sure. ÂWho do you think would pay attention to such a statement?

Those who care about doing the right thing.  I believe that
there are developers in that category, but lacking a clear and
authoritative statement of principles, many are easily mislead
by misinformation or simply assumptions made in the absence of
facts.

Now you have a valid point that in many respects it's too late
regarding this specific instance of this particular issue.  But
RFC 1958 covers a lot of ground, and is probably overdue for an
update and some reinforcement  I am dismayed at the poor quality
of engineering behind some recent proposals, and failing some
clear up-to-date architectural guidelines, I suspect that matters
will get worse.

> Like it or not, there is a large population in the real world
> who are not interested in that argument or position. ÂThere are
> even folks who are technically sophisticated enough to
> understand and accept your argument about protocol identifiers
> who nonetheless believe that they should be able to identify
> objects with names or acronyms that have mnemonic value in their
> languages and character sets.

There's an old maxim: "be careful what you ask for; you might
get it".  As already noted, the sort of problem under discussion
was not only predictable, it was predicted as the inevitable
result if IDNs.  So those folks got what they wanted, and the
problems that go hand-in-hand with it.  Unfortunately, everybody
else also suffers from the problems.

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf


[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]