Re: PKCS#11 URI slot attributes & last call

Patrik Fältström <paf@xxxxxxxxxx> · Wed, 31 Dec 2014 08:33:28 +0100

> On 31 dec 2014, at 08:03, Nico Williams <nico@xxxxxxxxxxxxxxxx> wrote:
> 
> On Wed, Dec 31, 2014 at 07:29:47AM +0100, Patrik Fältström wrote:
>>> On 30 dec 2014, at 20:53, Nico Williams <nico@xxxxxxxxxxxxxxxx> wrote:
>>> Better say
>>> nothing, because I think the thing to do is obvious enough, but if we
>>> must say anything, it's that the various strings (e.g., token manuf)
>>> are to be compared normalization-insensitively.
>> 
>> Sorry, but I have not heard the term "normalization-insensitively" before.
>> 
>> Can you explain what you mean?
> 
> Notionally, if you're comparing two unnormalized strings, you could
> normalize both then compare the two normalized strings.
> 
> Of course, that can be inefficient (e.g., if it means allocating memory,
> of if they will prove not equal in the first few codepoints) or
> infeasible (e.g., if one of the strings is actually a hashed key to a
> hash table).
> 
> What you can for the first case is compare code-unit by code-unit, with
> a fast path for the cases that need no normalization, and normalizing
> one character (but possibly multiple codepoints, of course) at a time.
> This limits the total memory consumption, and anyways, for the common
> case you can often expect an inequality result long before you're done
> traversing the shorter string.  This is (can be, if you do it right), of
> course, equivalent to normalizing both strings then comparing -- but it
> should usually be much faster.
> 
> For the second case the thing to do is to normalize the key at hash
> time, naturally.
> 
> [ZFS, incidentally, supports this for filesystem object names, and has
> for years now.]
> 
> Now, PKCS#11 nowadays supports UTF-8 for things like "token label", but
> it doesn't say anything about form -- why should it (see below)?
> 
> But where PKCS#11 URIs are intended to _match_ PKCS#11 resources by
> name... apps will need to care about normalization.  In practice, like a
> great many applications, doing nothing about normalization will probably
> work fine (until the day that it doesn't).  But saying anything about
> this could be tricky: what if there are two tokens with equivalent
> labels, just in different forms?  Fortunately PKCS#11 URIs can match on
> more attributes than labels, so there's that.
> 
> PKCS#11 should say "don't do that" or "don't do that, normalize to NFC"
> (or NFD, whatever), but doesn't (or I didn't find where it does, if it
> does), so the most that this document could say is "compare
> normalization-insensitively where possible".

Ok, so what you say is that the side that is to calculate whether there is a match or not can do whatever normalization they want on the string(s)? Or do you say that whoever is doing a match is to not do normalization at all as the application (on client side) can and should define what normalization (in a broader sense, not only Unicode Normalization) must be possible to define?

In IDNA2008, as you know, we did choose the latter, but recommend applications to define what normalization to do, and that NFC is the Unicode Normalization to use.

   Patrik

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail