Re: Cross-Platform Version Control

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 13 May 2009 14:10:17 -0700 (PDT)

On Wed, 13 May 2009, Matthias Andree wrote:

> Am 13.05.2009, 19:12 Uhr, schrieb Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx>:
> 
> > Use <stringprep.h> and stringprep_utf8_nfkc_normalize() or something to do
> > the actual normalization if you find characters with the high bit set. And
> > since I know that the OS X filesystems are so buggy as to not even do that
> > whole NFD thing right, there is probably some OS-X specific "use this for
> > filesystem names" conversion function.
> 
> Sorry for interrupting, but NF_K_C? You don't want that (K for compatibility,
> rather than canonical, normalization) for anything except normalizing
> temporary variables inside strcasecmp(3) or similar. Probably not even that.
> The normalizations done are often irreversible and also surprising. You don't
> want to turn 2³.c into 23.c, do you?

No, you're right. We want just plain NFC. I just googled for how some 
other projects handled this, and found the stringprep thing in a post 
about rsync, and didn't look any closer.

But yes, you're absolutely right, stringprep is total crap, and nfkc is 
horrible.

I have no idea of what library to use, though. For perl, there's 
Unicode::Normalize, but that's likely still subtly incorrect for the OS-X 
case due to the filesystem not using _strict_ NFD.

I have this dim memory of somebody actually pointing to the documentation 
of exactly which characters OS X ends up decomposing. Maybe we could just 
do a git-specific inverse of that, knowing that NOBODY ELSE IN THE WHOLE 
UNIVERSE IS SO TERMINALLY STUPID AS TO DO THAT DECOMPOSITION, and thus the 
OS X case is the only one we need to care about?

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html