Re: Cross-Platform Version Control

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 13 May 2009 09:26:19 -0700 (PDT)

On Tue, 12 May 2009, Jeff King wrote:
>
> Or they use a single encoding like utf8 so that there are no surprises.
> You can still run into normalization problems with filenames on some
> filesystems, though.  Linus's name_hash code sets up the framework to
> handle "these two names are actually equivalent", but right now I think
> there is just code for handling case-sensitivity, not utf8 normalization
> (but I just skimmed the code, so I might be wrong).

utf-8 normalization was one goal, and shouldn't be _that_ hard to do. But 
quite frankly, the index is only part of it, and probably not the worst 
part.

The real pain of filename handling is all the "read tree recursively with 
readdir()" issues. Along with just an absolute sh*t-load of issues about 
what to do when people ended up using different versions of the "same" 
name in different branches.

There's also the issue that "cross-platform" really can be a pretty damn 
big pain. What do you do for platforms that simply are pure shit? I 
realize that OS X people have a hard time accepting it, but OS X 
filesystems are generally total and utter crap - even more so than 
Windows.

Yes, yes, you can tell OS X that case matters, but that's not the normal 
case - and what do you do with projects that simply _do_ care about case. 
The kernel is one such project.

Sure, you can "encode" the filenames on such broken filesystems in a way 
that they'd be different - but that won't really help the project, since 
makefiles etc won't work anyway.

So one reason I didn't bother with utf-8 is that the much more fundamental 
issues are simply in plain old 7-bit US-ASCII. 

That said, if the only issue is that you want to encode regular utf-8 in a 
coherent way (and ignore the case issues), then we could probably do that 
part fairly easily with a "convert_to_internal()" and 
"convert_to_filename()" thing that acts very much like the CRLF conversion 
(except on filenames, not data).

And yes, it's probably worth doing, since we'd need that for fuller case 
support anyway.

It's just a fair amount of churn - not fundamentally _hard_, but not 
trivial either. And it needs a _lot_ of care, and a fair amount of 
testing that is probably hard to do on sane filesystems (ie the case where 
the filesystem actually _changes_ the name is going to be hard to test on 
anything sane).

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html