Re: FYI: BOF on Internationalized Email Addresses (IEA)

Valdis.Kletnieks@xxxxxx · Thu, 30 Oct 2003 15:44:06 -0500

On Thu, 30 Oct 2003 09:13:55 PST, niket@xxxxxxxxxxxxxx said:
> Forget Mongolian. Think Chinese and Hindi, plus related languages that
> use their character sets. Between the two of them you have nearly 3
> billion potential users, i.e. half the world's population. Admittedly
> not all of them are literate, and many do understand the Latin character
> set, but this is still a very large group to disenfranchise.

Right.  The yet unanswered question is how many would be disenfranchised
by making them learn the Latin charset, compared to how many would
be disenfranchised by a non-perfect globalization scheme (see my comment
yesterday regarding macrons and carons).

> There is a second thread to your argument which I object to. Just
> because many Internet users can understand the Latin character set does
> not mean they do not want to send stuff in their native character set,
> or be forced to use the Latin character set. Of course so far we have
> made it impossible to do so.

Note that this is discussing *addresses only*.  We've had charset
support for bodyparts and 2047-encoding for other header fields for *years*.

I get at least 5 or 6 emails a day that have addresses of the form
From: "kanji/big5/etc string here" <romanized.name@xxxxxxxxxxxxxxxxxxxx>
and/or have charset=utf-8 and kanji in them.

> Why place unnecessary restrictions on the Internet just because it
> results in messages that you personally can't understand?

An equally important consideration is that it result in messages that
are *usable* (possibly without comprehension). If whatever scheme we
decide on results in messages that I can't hit "reply" to or otherwise
process, it's not doing anybody any favors.

An often overlooked aspect of the ASCII charset is that it has 52 glyphs
which for the most part are visually distinctive (except for zero/oh, and
one/lower-ell), so even a non-speaker can make a determination "have I
entered the same glyphs as are on the business card?". This is not true
for any of the Asian glyph sets (at least *I* can't tell easily), and I
don't think that the Latin 1/A/B extension has this property either,
once you start dealing with macrons, cedillas, ogonceks, carons, dots,
and other ornamentation....

So my question remains:  are we doing the 3 billion asians a favor by forcing
them to be able to tell the difference between e-caron and e-breve?

Are we doing *anybody* favors if we make them use rfc3490-style xn-- strings
that are totally incomprehensible if they are from outside the local conclave?
Remember - if they don't understand Latin charsets, a 3490-encoded address will
be *painful*, even for the *owner*.  You don't believe me? Take the character
string 'valdis.kletnieks', change the first e to 0113 (small e-macron), punycode it,
and let me know how much mnemonic value it has.

And remember - the string you get there is the sort of thing that all 3 billion
Asians will get to enter (after I get my sysadmin to set up the aliases to get that
punycode to actually drop into *my* mailbox).

Are you sure it's worth the effort?

It's not that I'm unsympathetic to the goals - far from it.  It's just that I
was there during the RFC2047 wars (which are *still* going on in the spam
world, silly spammers sending around untagged 8-bit headers), and a big part
of me wants to say "Oh no, not again....".

Attachment:
pgp00347.pgp

Description: PGP signature