RE: IETF.org does its part on internationalization

Bernard Aboba <bernard_aboba@xxxxxxxxxxx> · Sun, 7 Sep 2008 15:48:17 -0700

>> .....Due to the ASCII character encoding being the core/monopoly

>This is a bad start: non-ASCII characters are used on the Internet for
>many years. There is certainly an ASCII *bias* in many Internet
>protocols, applications or deployments but if there was an ASCII
>*monopoly*, it ended a long time ago.
From draft-ietf-idnabis-protocol-03.txt Section 6.1:

   The current update to the definition of the DNS protocol [RFC2181]
   explicitly allows domain labels to contain octets beyond the ASCII
   range (0000..007F), and this document does not change that.  Note,
   however, that there is no defined interpretation of octets 0080..00FF
   as characters.  If labels containing these octets are returned to
   applications, unpredictable behavior could result.  The A-label form,
   which cannot contain those characters, is the only standard
   representation for internationalized labels in the current DNS
   protocol.

As noted above, the DNS protocol does not prohibit the carrying
of non-ASCII characters; the issue is the response of applications to receipt
of such characters in responses.  Presumably applications written to 
UNICODE APIs such as GetAddrInfoW are capable of handling UTF-8 in 
responses, and indeed there are many such applications (e.g. applications 
depending on .NET/mono DNS classes).  

> > presently you cannot have domain names that are multilingual, for
> > example: japanese and english language mixed character domain names,
> > hindi and english language mixed character domain names etc. 
> 
> Since it is an IETF mailing list, I will focus on what depends on
> IETF, technical standards. There is *nothing* in the current IDN
> standard (machine names in Unicode) that forbids such mixes. You may
> refer to bad policies like ICANN IDN Guidelines, which apparently
> forbid mixing scripts, but this had nothing to do with the IETF,
> nothing to do with the protocols.

From draft-ietf-idnabis-rationale-01.txt Section 14:

   To help prevent confusion between characters that are visually
   similar, it is suggested that implementations provide visual
   indications where a domain name contains multiple scripts.  Such
   mechanisms can also be used to show when a name contains a mixture of
   simplified and traditional Chinese characters, or to distinguish zero
   and one from O and l.  DNS zone administrators may impose
   restrictions (subject to the limitations identified elsewhere in this
   document) that try to minimize characters that have similar
   appearance or similar interpretations.  It is worth noting that there
   are no comprehensive technical solutions to the problems of
   confusable characters.  One can reduce the extent of the problems in
   various ways, but probably never eliminate it.  Some specific
   suggestions about identification and handling of confusable
   characters appear in a Unicode Consortium publication
   [Unicode-UTR36].

This is *not* a prohibition, but rather a suggestion; Section 4 of the document contains no restriction on the registration of labels with mixed scripts.  Similar advice can be found in RFC 3490 Section 10. 

> > Another example, there is not much browser / URL bar integration and
> > usability innovation that allow for a non-ASCII language domain name
> > to stay non-ASCII script on the browser / URL bar without it
> > changing to Punycode.  

From draft-ietf-idnabis-rationale-01.txt Section 7.2:

   Applications MAY
   allow the display and user input of A-labels, but are encouraged to
   not do so except as an interface for special purposes, possibly for
   debugging, or to cope with display limitations.  A-labels are opaque
   and ugly, and, where possible, should thus only be exposed to users
   and in contexts in which they are absolutely needed.  Because IDN
   labels can be rendered either as the A-labels or U-labels, the
   application may reasonably have an option for the user to select the
   preferred method of display; if it does, rendering the U-label should
   normally be the default.

Indeed, there are browsers (e.g. Safari) that actually follow this advice (and provide a more pleasant user experience as a result).  

_______________________________________________

Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf