Re: [idn] Re: 7 bits forever!

John C Klensin <klensin@jck.com> · Fri, 05 Apr 2002 14:41:53 -0500

--On Friday, 05 April, 2002 22:53 +0700 Robert Elz
<kre@munnari.OZ.AU> wrote:

>     Date:        Thu, 4 Apr 2002 09:50:01 -0800 (PST)
>     From:        "Gary E. Miller" <gem@rellim.com>
>     Message-ID:
> <Pine.LNX.4.44.0204040931110.10828-100000@catbert.rellim.com>
> 
>   | Maybe it can, but that does not make it right.
>   | 
>   | RFC 1035 "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION"
>   | 
>   | 2.3.1
> 
> If you actually go read that section, carefully, instead of
> just quoting the part from it that everyone notices first, you
> will see that it says something quite different from what you
> think it does.
> 
> You need to read the part of the section that appears on the
> preceding page of the formatted RFC...
> 
> Or see (part of) rfc2181 for a longer verison of this.

Actually, having read that section, and several other sections,
_very_ carefully in recent months, I think 2181 is contradictory
at best, and possibly seriously wrong, on this point.

As I read them, what 1034 and 1035 say is that the DNS can
accomodate any octets, but that [at least then]
currently-specified RRs are restricted to ASCII.  The LDH rule
is a good ("best"?) practices one.  It is the LDH rule that RFC
1123 modified slightly.  And it is quite correct to assert that
the LDH rule is not a _DNS_ requirement.

But the ASCII rule is a firm requirement.  For evidence of this,
temporarily ignore the text (although, personally, I think it is
clear -- especially in 2.3.3-- if read carefully) and examine
the requirement that, for the defined RRs, labels and queries be
compared in a case-insensitive way.  For ASCII, that is a
well-defined operation, one that can be performed by doing the
comparison under a bit mask.  For other scripts, as the IDN WG
discovered, "case insensitive comparison" is typically not
completely well-defined, often involves complex tables and/or
knowledge of local context, and is sometimes quite controversial
as to what is intended.

So I believe that the "future RRs" language with regard to
binary labels in 1034 and 1035 must be taken seriously and as
normative text: if new RRs (or new classes) are defined, they
can be defined as binary and, hence, as not requiring
case-insensitive comparisons.  Conversely, within the current
set (or at least the historical set at the time of 1034/1035),
case-insensitive comparison is required and hence binary must
not be permitted.

Any other reading, I believe, leads immediately either to
contradictions or to undefined states within the protocol.

As an aside, it appears to me that this requirement for
case-insensitive comparison is the real problem with "just put
UTF-8 in the DNS" approaches.  An existing and conforming
implementation has no way to do those required case-insensitive
comparisons outside the ASCII range.  Worse, if it does those
comparisons by bit-masking (which would be conforming today),
there is a risk of its getting rather bizarre errors (of either
matching or not matching) on characters outside the ASCII range.
One supposes that we could modify the protocol to specify that
case-insensitive comparisions be made only for octets in the
ASCII range, but, unless that were done through an EDNS option,
it would be a potentially fairly significant retroactive change.

    john