Re: [Last-Call] [I18ndir] last call reviews of draft-ietf-regext-epp-eai-12 (and -15)

Martin J. Dürst <duerst@xxxxxxxxxxxxxxx> · Mon, 26 Sep 2022 16:31:48 +0900

Very sorry to be late with my reply, and for not replying to the latest 
posting from John Klensin in this thread.

On 2022-09-14 04:03, John C Klensin wrote:
James,

My apologies for not having responded to your note sooner.
I've been preoccupied with several unrelated things.

I greatly appreciate the changes to use an existing EPP
extension framework and to correct the terminology error of EAI
-> SMTPUTF8.   I agree that the more substantive SMTPUTF8
technical issues should go back to the WG.

However, in order that the discussion you suggest for IETF 115
be useful and not just lead to another round of heated Last Call
discussions, I think that, for the benefit of those who have
been following the discussion closely and those who should have
been, it is important to be clear about what the disagreement is
about.  When you characterize the issue as "e-mail cardinality",
it makes it sound, at least to me (maybe everyone in the WG has
a better understanding) like this is some subtle technical
matter.

It really isn't.  The EAI WG was very clear during the
development of the SMTPUTF8 standards that the biggest problems
with non-ASCII email addresses were going to be with user agents
(MUAs) (and, to some degree, with IMAP and POP servers that are
often modeled as part of MUAs) and not with SMTP transport over
the Internet.  Making an MUA tailored to one particular language
and script (in addition to ASCII), or even a handful of them, is
fairly easy.  Making one that can deal well with all possible
SMTPUTF8 addresses is very difficult (some would claim
impossible, at least without per-language, or
per-language-group, plugins or equivalent).

I very strongly think that "an MUA that can deal well with all possible 
SMTPUTF8 addresses" is a red herring.

First, as far as backing store (in-memory representation) is concerned, 
any implementation that is able to handle full Unicode and SMTPUTF8 will 
be fine; there's no dependency there on natural languages or scripts. 
And because there days, most MUAs will use user-interface tool kits or 
OS components that support Unicode, for most MUAs, that part may be 
essentially for free. This leaves the logic of "if non-ASCII in LHS of 
email address, then use SMTPUTF8, otherwise not" and the transcoding 
from the internal Unicode representation (possibly UTF-16) to and from 
UTF-8 (available as a library function). So on this level, an MUA that 
is able to deal with SMTPUTF8 is able to deal with all possible SMTPUTF8 
addresses, or otherwise it's very badly written.

Second is the level of display. Here again, it's important to understand 
that MUA implementers will just use a tool kit, which includes a 
rendering library (such as harfbuzz) that takes care of all the glyph 
selection and shaping details. And it will use (via that library) the 
fonts available on the OS. If the necessary font is not available (e.g. 
for scripts just recently added to Unicode), then square boxes or 
question marks or something similar will be displayed, but it should 
still be possible to copy an address from a browser to an 
(SMTPUTF8-capable) MUA and send the mail. Similar for rendering 
variations; the browser may show a frog with a tongue, but the MUA may 
show a frog followed by a tongue. If that's the result of copy-paste, 
the mail should still be delivered correctly.

[It is important to note here that these days, the numbers of email 
addresses that get copied by hand from a napkin or business card to an 
MUA is way down, and copying from one application (e.g. a browser) to 
another is the main stream.]

Third, there's a saying "the better is the enemy of the good". It can be 
abused to justify sloppiness, but in the area of internationalization, 
it's very important. If somebody wants to use a Cyrillic or Devanagari 
or Han (Chinese/Japanese) or Greek,... email address, they don't care 
whether a script such as Nag Mundari (new in Unicode 15.0.0, out on 
September 13) or some Egyptian hieroglyph format controls (also new in 
Unicode 15.0.0) or even some Devanagari characters used to represent 
auspicious signs found in inscriptions and manuscripts (dito) are 
available. Because of the very long tail of languages, scripts, and 
characters, a requirement that "all possible SMTPUTF8 addresses" are 
covered is very counterproductive. It denies the huge majority of people 
interested in such addresses something because there may be other who 
aren't yet able to get it, and in turn will only cause additional delay 
for everybody.

So my conclusion for the draft in question is that allowing more than 
one email address won't hurt, saying that one of them can be used for an 
all-ASCII fallback won't hurt, but not moving the draft forward if these 
changes are not made isn't really justified.

Regards,   Martin.

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call