NF* (Re: PKCS#11 URI slot attributes & last call)

Nico Williams <nico@xxxxxxxxxxxxxxxx> · Wed, 31 Dec 2014 01:46:45 -0600

On Wed, Dec 31, 2014 at 08:33:28AM +0100, Patrik Fältström wrote:
> Ok, so what you say is that the side that is to calculate whether
> there is a match or not can do whatever normalization they want on the
> string(s)? Or do you say that whoever is doing a match is to not do
> normalization at all as the application (on client side) can and
> should define what normalization (in a broader sense, not only Unicode
> Normalization) must be possible to define?

I'm saying something subtly different:

When you don't have the luxury of every string you might chance upon
being required to be normalized to the one true form, then you have
three choices:

 - give up

 - go fix whatever needs to be fixed so that you do have that luxury

   Here that would be: PKCS#11 itself, token vendors, and so on.

   I.e., not quite boiling the oceans but maybe a Great lake.

 - try your best

Normalization-insensitive comparison falls into the third bucket.

> In IDNA2008, as you know, we did choose the latter, but recommend
> applications to define what normalization to do, and that NFC is the
> Unicode Normalization to use.

For another example, in the world of filesystems we have:

 - most of the world just-uses-8 (UTF-8, ISO-8859-*, whatever, and when
   UTF-8, form is accidental)

 - some of the world insists on UTF-8 (though it's hard for a filesystem
   to enforce this: all it sees is octet strings)

    - some of the world normalizes to NFD (close enough) on create and
      lookup (e.g., HFS+)

    - some of the world is normalization-preserving-but-form-insensitive
      (ZFS)

The NFD case is obnoxious because even on those systems the input system
tends to produce NFC...  But anyways.  When you have no canonical form
for whatever reason, you can try form-insensitive matching.

Obviously there's aliasing to consider (but there is anyways), and so
on.  But none of this is terribly interesting here except for the "best
effort matching" idea, since that's probably the user-friendly and
not-too-dangerous thing to do here.

Nico
--