Am 13.05.2009, 23:10 Uhr, schrieb Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx>:
On Wed, 13 May 2009, Matthias Andree wrote:
Am 13.05.2009, 19:12 Uhr, schrieb Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx>:
> Use <stringprep.h> and stringprep_utf8_nfkc_normalize() or something
to do
> the actual normalization if you find characters with the high bit
set. And
> since I know that the OS X filesystems are so buggy as to not even do
that
> whole NFD thing right, there is probably some OS-X specific "use this
for
> filesystem names" conversion function.
Sorry for interrupting, but NF_K_C? You don't want that (K for
compatibility,
rather than canonical, normalization) for anything except normalizing
temporary variables inside strcasecmp(3) or similar. Probably not even
that.
The normalizations done are often irreversible and also surprising. You
don't
want to turn 2³.c into 23.c, do you?
No, you're right. We want just plain NFC. I just googled for how some
other projects handled this, and found the stringprep thing in a post
about rsync, and didn't look any closer.
But yes, you're absolutely right, stringprep is total crap, and nfkc is
horrible.
Crap? It's just besides the purpose and some limited form of fuzzy match.
Anyways...
I have no idea of what library to use, though. For perl, there's
Unicode::Normalize, but that's likely still subtly incorrect for the OS-X
case due to the filesystem not using _strict_ NFD.
Perhaps ICU (ICU4C), from http://site.icu-project.org/
--
Matthias Andree
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html