On Wed, Dec 31, 2014 at 07:29:47AM +0100, Patrik Fältström wrote: > > On 30 dec 2014, at 20:53, Nico Williams <nico@xxxxxxxxxxxxxxxx> wrote: > > Better say > > nothing, because I think the thing to do is obvious enough, but if we > > must say anything, it's that the various strings (e.g., token manuf) > > are to be compared normalization-insensitively. > > Sorry, but I have not heard the term "normalization-insensitively" before. > > Can you explain what you mean? Notionally, if you're comparing two unnormalized strings, you could normalize both then compare the two normalized strings. Of course, that can be inefficient (e.g., if it means allocating memory, of if they will prove not equal in the first few codepoints) or infeasible (e.g., if one of the strings is actually a hashed key to a hash table). What you can for the first case is compare code-unit by code-unit, with a fast path for the cases that need no normalization, and normalizing one character (but possibly multiple codepoints, of course) at a time. This limits the total memory consumption, and anyways, for the common case you can often expect an inequality result long before you're done traversing the shorter string. This is (can be, if you do it right), of course, equivalent to normalizing both strings then comparing -- but it should usually be much faster. For the second case the thing to do is to normalize the key at hash time, naturally. [ZFS, incidentally, supports this for filesystem object names, and has for years now.] Now, PKCS#11 nowadays supports UTF-8 for things like "token label", but it doesn't say anything about form -- why should it (see below)? But where PKCS#11 URIs are intended to _match_ PKCS#11 resources by name... apps will need to care about normalization. In practice, like a great many applications, doing nothing about normalization will probably work fine (until the day that it doesn't). But saying anything about this could be tricky: what if there are two tokens with equivalent labels, just in different forms? Fortunately PKCS#11 URIs can match on more attributes than labels, so there's that. PKCS#11 should say "don't do that" or "don't do that, normalize to NFC" (or NFD, whatever), but doesn't (or I didn't find where it does, if it does), so the most that this document could say is "compare normalization-insensitively where possible". Nico --