On Tue, Sep 15, 2009 at 12:16:44PM -0400, John C Klensin wrote: > --On Tuesday, September 15, 2009 15:28 +0100 Kurt Zeilenga > <Kurt.Zeilenga@xxxxxxxxx> wrote: > > > I strongly oppose such an 'or' as SASLprep and Net-UTF-8 uses > > different Unicode normalization algorithms. > > Well, not really. Really :) > >... > > RFC 5198 says 'all character sequences SHOULD be normalized > > according to Unicode normalization form "NFC" (see Section 3).' > > RFC 4013 says 'This profile specifies using Unicode > > normalization form KC, as described in Section 4 of > > [StringPrep].' > > [...] > > Now, NFKC processing is a proper superset of NFC processing. An implementor that stores NFC strings will not interoperate with any peer that sends query strings in NFKC. That's because a peer could send a query string that doesn't match any storage string without additional normalization of the storage strings! I think the right answer is to leave _query_ strings unnormalized and require that _storage_ strings be normalized (see my separate reply on that general topic, with a different Subject:, just now). (Nodes that store strings have to have enough normalization code to validate the normalization of query strings, if query strings are required to be normalized. Expecting implementors to normalize query strings is not that big a deal. Peers that send query strings will typically need to be able to normalize too, for local reasons, but there's no obvious reason why there must be such local reasons.) Then the choice of normalization form for storage strings only affects peers that read them back -- which is enough to justify requiring the use of a normalization form for storage strings. The choice of K or not K then can be conceivably left to the implementor (provided peers are required to support non-K when reading back storage strings). Nico -- _______________________________________________ Ietf@xxxxxxxx https://www.ietf.org/mailman/listinfo/ietf