--On Wednesday, 31 December, 2014 02:25 -0600 Nico Williams <nico@xxxxxxxxxxxxxxxx> wrote: > On Wed, Dec 31, 2014 at 08:54:00AM +0100, Patrik Fältström > wrote: >> What I think is then needed is for this case: >> >> 1. A simple explanation what you really is talking about >> >> What is the requirement on whom regarding >> normalization/mapping/whatever? > > The I-D in question defines a URI scheme for PKCS#11 resources > some of whose naming attributes are character strings which > PKCS#11 says should be UTF-8. PKCS#11 (*not* an Internet > standard) does not say anything about form. Should this I-D > say anything about form? > > IMO the most it should say is "PKCS#11 doesn't specify a > canonical form for these labels, therefore the application may > need to canonicalize prior to comparing them". The > alternative is to say nothing. Nico, commenting on this issue only and doing it in more general terms (going a bit beyond Patrik's "IETF have in many cases created profiles...": The conventions for IETF-approved publications include that they are supposed to support interoperability and that features/ characteristics that would interfere with interoperability are grave defects. This is especially true of Standards Track documents, where 2026 very clearly makes "known technical omissions" absent specific reasons for waiving that requirement. At least by convention for nearly two decades, the IESG reaching such a conclusion requires clear documentation of the defect and the reason for making an exception in the specification and usually in the Last Call. Nowhere in our procedures is that any provision for a standards-track document to get a waiver because some other standards body got sloppy, did something that wouldn't meet our standards, or didn't quite understand the implications of what they were doing. Now, we've had a lot of specs written on the assumption that a sufficient path to internationalization of a protocol designed around ASCII (or an ISO 646 profile or IA5) was "just say 'UTF-8' where we used to say 'ASCII', use UTf-8, and go merrily on your way". After a while, with help from both experience and some of our friends, we figured out that wasn't a good idea and various specs now, appropriately, push back anything resembling "just use UTF-8" (a statement like "It's all UTF-8, all the time, so that form-insensitive can work" (from your earlier note, not the spec) is an example of "just used UTF-8" thinking. In addition, we have an often-ignored requirement for an "Internationalization Considerations" section when a document touches on i18n issues (See Section 6 of RFC 2277). Personally, I don't think it is important for documents that really address i18n topics but that it is extremely so when, e.g., the spec doesn't really address the i18n issues but repeatedly says things like "...in environments that are not strictly limited to US-ASCII". Without specific instructions (and I can find none on quick skimming), that is dealing with i18n considerations by aggressive handwaving. One of the more impressive examples of this is "...an implementer ought to use the spirit rather than the letter of the rules when generating or parsing these formats in environments that are not strictly limited to US-ASCII." But the most frequent complaint we hear about i18n from protocol designers in the IETF is similar to "I'm not an expert on this stuff and don't intend to become one; just tell me what to do". The above does nothing for "just tell me what to do". It instead implies that the implementer should become enough of an expert to figure out what the implications of "the spirit" actually are. FWIW, I can't easily figure that out because there are so many whitespace characters, zero-width things, breaks and non-breaks of various sorts, etc., in Unicode to say nothing of conventions in various scripts that don't separate "words" with space-like things. There is some guidance in a few Unicode specs, but they are hard to read and understand, much less apply reasonably to a particular situation, unless one already has a good understanding of the Unicode Standard and some of the issues involved. Normalization is easily dealt with by making a clear statement. Historically, our experience has been that the obvious reasonably clear statement is "use NFC". The growing community opinion (including in the W3C i18n effort which is much more active than various IETF-related groups) seems to be "don't worry about normalization until comparison (or equivalent) time because it will have to be done again then anyway to be safe). You (and the authors) pick, but I agree with Patrik that something needs to be said unless you take the alternate suggestion below. But other issues, like the whitespace one called out above, are far more complex and require serious treatment of some sort. Alternate suggestion in the interest of getting this out and recognizing that this is mostly a PKCS#11 problem (presumably ITU and/or RSA, but what do I know) and shouldn't be an IETF one: (1) Put in an Internationalization Considerations section, which I believe is required anyway. (2) Indicate that PKCS#11 severely underspecifies issues associated with characters outside the ASCII repertoire and, especially, contexts not associated with European languages. (3) Say that, at least until PKCS#11 is updated to more adequately handle and specify i18n issues, such characters, and certificates that use them, SHOULD NOT be used in or referenced from URIs, unless there is a clear need and the issues associated with the characters to be used are clearly understood. (4) As appropriate, update the handwaving in this document to point to that new section. That would make it very clear that you are not telling people how to do it and would make the warning as obvious as it should be. Finally... > PKCS#11 is an API. PKCS#11 apps might "interoperate" using > PKCS#11 URIs communicated over, e.g., IPC (or plain old > cut-n-paste). > > PKC#11 URI _templates_ might well be exchanged far and wide, > but still not really as a part of a network protocol. For many years, the IETF had a very strong "we don't do APIs" policy. That was motivated, at least in part, because APIs tend to make strong assumptions about programming language and operating system environments, either favoring some over others (a business we didn't want to be in) or not standing the test of time as things outside the IETF evolved. The view was that we were much better off specifying requirements and protocols and leaving APIs to particular languages, libraries/packages, or operational environments. Times change but it seems that many of the times we do APIs drop us into a rat hole similar to this one in which we are trying to do an overlay to a spec over which we have little or no control. Part of the problem is that an API is a somewhat different type of beast from a protocol-style Technical Specification. If we are going to keep doing these, it may be time to modify/update 2026 to introduce a new class of standards-track document. Until and unless we are willing to do that, I think we'd better get used to these rough edges and stop pretending that they are good excuses for work that doesn't meet the Technical Specification target criteria. john