[Last-Call] Re: Last Call: <draft-klensin-idna-rfc5891bis-09.txt> (Internationalized Domain Names in Applications (IDNA): Registry Restrictions and Recommendations) to Proposed Standard

John C Klensin <john-ietf@xxxxxxx> · Thu, 20 Feb 2025 02:43:31 -0500

One small clarification while I try to find time to respond
constructively to other messages...

--On Thursday, February 20, 2025 12:34 +1300 Brian E Carpenter
<brian.e.carpenter@xxxxxxxxx> wrote:

>>> But I don't see that the present draft has any duty to provide a
>>> full threat analysis; illustrative threats are enough to justify
>>> preemptive counter-measures today.
>> 
>> Strong no again. When you propose things that will limit the
>> possibilities of the Internet users, you need to back that with a
>> serious threat analysis.
> 
> I think the threat analysis for look-alikes is obvious, but your
> point
> seemed to be lack of evidence for homographs in the wild. IMHO
> that's
> beside the point; it's the *possibility* of homograph abuse that is
> the threat, and limiting that possibility should a shared goal.

Yes.   However, if the only issue were homographs, I don't think
either Asmus or I would have bothered with this document.  I think
almost any ten-year-old could answer a question like "do these two
strings look alike" and get it right most of the time.  I assume a
well-trained AI pattern recognition engine would not do much worse.
That does, of course, interact with one of the points of this draft
(and of IDNA2008) which is whether there is any obligation for domain
name registries to detect such situations and block the registrations
in the interest of a well-functioning Internet with domain names as
important identifiers.

However, there are other cases -- complex scripts and code point
relationships within them, bidi scripts (especially labels containing
mixtures of left-to-right and right-to-left characters), and so on.
For those cases, the best rule is probably close to "don't allow any
string containing code points you don't understand to be registered".
When that is impractical for whatever reason, the second-best rule is
"find someone or some recommendations for those code points and the
scripts (and probably languages) with which they are associated and
use them as a starting point for specifying filters that avoid names
that would obviously be unsafe or unwise if only you knew enough for
them to be obvious".  

Again, look-alikes are part of that picture, but not the most
important part.

Or, as I have tried to say in other notes, one can just argue that it
is a dangerous world out there, that users who can't take the trouble
to understand the strings that make up the labels they encounter are
volunteers to be victims, and that is their problem, not the
Internet's or anyone else's.  I obviously don't believe that, but, if
it is the IETF's consensus, then, IMO, we should drop this draft and
probably every other effort to define what code points or character
sequences are reasonable for use in DNS labels or elsewhere... and we
may not want any more best practice documents that say "this is a
good idea and that isn't" (whether character strings are involved or
not) because the same logic probably applies to them.

    john

-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx