--On Wednesday, January 07, 2015 11:16 -0800 Jan Pechanec <jan.pechanec@xxxxxxxxxx> wrote: > On Sat, 3 Jan 2015, Jan Pechanec wrote: > > hi, I haven't received any other comments on the draft > recently (I know the LC already ended on Dec 29 though) so I > think I can file changes discussed and drafted in this thread > as draft 18 on Friday. Thank you all for feedback, I really > appreciate it. > > one more change for the draft 18 (v2 attached) is to spell > "NFC" and reference the Unicode Annex on normalization based > on comments from Jaroslav and Christian. >... Jan, I don't have a lot of time to spend on this and am not an expert on either X.509 or PKCK (#11 or otherwise). At least the first may be unfortunate, but it is what it is. While I think the changes you have made are definitely improvements, this i18n stuff is complicated. As with Security, there is a completely inadequate supply of magic pixie dust that can be thrown at problems to make them go away. "Normalize to NFC" (with spelling-out and references) is a vast improvement or "use [valid] UTF-8" but there are many other issues. You have noted some and omitted others. For example, case-independent matching is a very simple and completely deterministic issue for ASCII (one essentially just masks off one bit within a certain range), it can get very messy if one tries to be sensitive to different locales that have different conventions about what to do with diacritical marks when lower-case characters are converted to upper case. There are Unicode "CaseFold" rules that are at least self-consistent but which contain wjat amount to exceptions for some language contexts (e.g., for dotless "i") but they are wildly unpopular in some places. We used to joke that, every time we tried to carefully examine a new script and set of languages for IDN-related purposes, it was like turning over rocks with vipers hiding under them. Each new script or language context turned up a different set of difficult issues -- the only surprise what what sort of creatures crawled out, not whether there would be creatures there. The joke has gone out of fashion, but the realities that inspired it survive. Part of this is an inherent problem with trying to create a universal character set -- languages and writing systems are diverse enough that any "one size fits all" model or set of decision rules is guaranteed to be deeply problematic and upsetting for some people (and legitimately so) while developing too many script-specific (or language-specific) rules or exceptions is almost certain to upset those who feel a need for simpler approaches that can be incorporated into generic software. For your reading pleasure, draft-klensin-idna-5892upd-unicode70 discusses one set of cases in which application of different rules and criteria led to a conclusion that may be just right for some communities but that is definitely problematic for others. I don't know how far in explaining this your document should go. I would urge, as I think I did before, some fairly strong warnings that, at least until the issues are clarified in PKCS#11 itself, one should be very certain one knows what one is doing (and what the consequences of one's choices will be) if one decides to move beyond the safety and general understanding of the ASCII/ ISO 646/ IA5 letter and digit repertoire. That sort of warning should supplement your NFC language, not replace it-- neither is a substitute for the other. Whether you incorporate it or not, your I-D should not assume that, by saying "NFC", you have somehow resolved the full range of issues in this area, any more than saying "UTF-8" did. For more information, you might have a look at some of the PRECIS work, notably draft-ietf-precis-framework. I also remain convinced that the best place to fix this is in the PKCS#11 spec itself. One is always at a disadvantage when trying to work around an inadequate specification in a different specification that has to depend on it and your work is no exception. I wish there were whatever liaison arrangements between the IETF and others (presumably notably RSA) to be sure that happened or at least there was clear awareness on the PKCS side of the deficiencies. Happy New Year, john