i18n requirements (was: Re: NF* (Re: PKCS#11 URI slot attributes & last call))

John C Klensin <john-ietf@xxxxxxx> · Wed, 31 Dec 2014 10:41:28 -0500

--On Wednesday, 31 December, 2014 02:25 -0600 Nico Williams
<nico@xxxxxxxxxxxxxxxx> wrote:

> On Wed, Dec 31, 2014 at 08:54:00AM +0100, Patrik Fältström
> wrote:
>> What I think is then needed is for this case:
>> 
>> 1. A simple explanation what you really is talking about
>> 
>> What is the requirement on whom regarding
>> normalization/mapping/whatever?
> 
> The I-D in question defines a URI scheme for PKCS#11 resources
> some of whose naming attributes are character strings which
> PKCS#11 says should be UTF-8.  PKCS#11 (*not* an Internet
> standard) does not say anything about form.  Should this I-D
> say anything about form?
> 
> IMO the most it should say is "PKCS#11 doesn't specify a
> canonical form for these labels, therefore the application may
> need to canonicalize prior to comparing them".  The
> alternative is to say nothing.

Nico, commenting on this issue only and doing it in more general
terms (going a bit beyond Patrik's "IETF have in many cases
created profiles...": The conventions for IETF-approved
publications include that they are supposed to support
interoperability and that features/ characteristics that would
interfere with interoperability are grave defects.  This is
especially true of Standards Track documents, where 2026 very
clearly makes "known technical omissions" absent specific
reasons for waiving that requirement.  At least by convention
for nearly two decades, the IESG reaching such a conclusion
requires clear documentation of the defect and the reason for
making an exception in the specification and usually in the Last
Call.

Nowhere in our procedures is that any provision for a
standards-track document to get a waiver because some other
standards body got sloppy, did something that wouldn't meet our
standards, or didn't quite understand the implications of what
they were doing.

Now, we've had a lot of specs written on the assumption that a
sufficient path to internationalization of a protocol designed
around ASCII (or an ISO 646 profile or IA5) was "just say
'UTF-8' where we used to say 'ASCII', use UTf-8, and go merrily
on your way".  After a while, with help from both experience and
some of our friends, we figured out that wasn't a good idea and
various specs now, appropriately, push back anything resembling
"just use UTF-8" (a statement like "It's all UTF-8, all the
time, so that form-insensitive can work" (from your earlier
note, not the spec) is an example of "just used UTF-8" thinking.

In addition, we have an often-ignored requirement for an
"Internationalization Considerations" section when a document
touches on i18n issues (See Section 6 of RFC 2277).  Personally,
I don't think it is important for documents that really address
i18n topics but that it is extremely so when, e.g., the spec
doesn't really address the i18n issues but repeatedly says
things like "...in environments that are not strictly limited to
US-ASCII".  Without specific instructions (and I can find none
on quick skimming), that is dealing with i18n considerations by
aggressive handwaving.   One of the more impressive examples of
this is

	"...an implementer ought to use the spirit rather than
	the letter of the rules when generating or parsing these
	formats in environments that are not strictly limited to
	US-ASCII."

But the most frequent complaint we hear about i18n from protocol
designers in the IETF is similar to "I'm not an expert on this
stuff and don't intend to become one; just tell me what to do".
The above does nothing for "just tell me what to do".  It
instead implies that the implementer should become enough of an
expert to figure out what the implications of "the spirit"
actually are.  FWIW, I can't easily figure that out because
there are so many whitespace characters, zero-width things,
breaks and non-breaks of various sorts, etc., in Unicode to say
nothing of conventions in various scripts that don't separate
"words" with space-like things.  There is some guidance in a few
Unicode specs, but they are hard to read and understand, much
less apply reasonably to a particular situation, unless one
already has a good understanding of the Unicode Standard and
some of the issues involved.

Normalization is easily dealt with by making a clear statement.
Historically, our experience has been that the obvious
reasonably clear statement is "use NFC".  The growing community
opinion (including in the W3C i18n effort which is much more
active than various IETF-related groups) seems to be "don't
worry about normalization until comparison (or equivalent) time
because it will have to be done again then anyway to be safe).
You (and the authors) pick, but I agree with Patrik that
something needs to be said unless you take the alternate
suggestion below.

But other issues, like the whitespace one called out above, are
far more complex and require serious treatment of some sort.

Alternate suggestion in the interest of getting this out and
recognizing that this is mostly a PKCS#11 problem (presumably
ITU and/or RSA, but what do I know) and shouldn't be an IETF one:

(1) Put in an Internationalization Considerations section, which
I believe is required anyway.

(2) Indicate that PKCS#11 severely underspecifies issues
associated with characters outside the ASCII repertoire and,
especially, contexts not associated with European languages.  

(3) Say that, at least until PKCS#11 is updated to more
adequately handle and specify i18n issues, such characters, and
certificates that use them, SHOULD NOT be used in or referenced
from URIs, unless there is a clear need and the issues
associated with the characters to be used are clearly understood.

(4) As appropriate, update the handwaving in this document to
point to that new section.

That would make it very clear that you are not telling people
how to do it and would make the warning as obvious as it should
be.

Finally...

> PKCS#11 is an API.  PKCS#11 apps might "interoperate" using
> PKCS#11 URIs communicated over, e.g., IPC (or plain old
> cut-n-paste).
> 
> PKC#11 URI _templates_ might well be exchanged far and wide,
> but still not really as a part of a network protocol.

For many years, the IETF had a very strong "we don't do APIs"
policy.  That was motivated, at least in part, because APIs tend
to make strong assumptions about programming language and
operating system environments, either favoring some over others
(a business we didn't want to be in) or not standing the test of
time as things outside the IETF evolved.  The view was that we
were much better off specifying requirements and protocols and
leaving APIs to particular languages, libraries/packages, or
operational environments.

Times change but it seems that many of the times we do APIs drop
us into a rat hole similar to this one in which we are trying to
do an overlay to a spec over which we have little or no control.
Part of the problem is that an API is a somewhat different type
of beast from a protocol-style Technical Specification.  If we
are going to keep doing these, it may be time to modify/update
2026 to introduce a new class of standards-track document.
Until and unless we are willing to do that, I think we'd better
get used to these rough edges and stop pretending that they are
good excuses for work that doesn't meet the Technical
Specification target criteria.

   john