On Tue, 12 May 2009, Jeff King wrote: > > Or they use a single encoding like utf8 so that there are no surprises. > You can still run into normalization problems with filenames on some > filesystems, though. Linus's name_hash code sets up the framework to > handle "these two names are actually equivalent", but right now I think > there is just code for handling case-sensitivity, not utf8 normalization > (but I just skimmed the code, so I might be wrong). utf-8 normalization was one goal, and shouldn't be _that_ hard to do. But quite frankly, the index is only part of it, and probably not the worst part. The real pain of filename handling is all the "read tree recursively with readdir()" issues. Along with just an absolute sh*t-load of issues about what to do when people ended up using different versions of the "same" name in different branches. There's also the issue that "cross-platform" really can be a pretty damn big pain. What do you do for platforms that simply are pure shit? I realize that OS X people have a hard time accepting it, but OS X filesystems are generally total and utter crap - even more so than Windows. Yes, yes, you can tell OS X that case matters, but that's not the normal case - and what do you do with projects that simply _do_ care about case. The kernel is one such project. Sure, you can "encode" the filenames on such broken filesystems in a way that they'd be different - but that won't really help the project, since makefiles etc won't work anyway. So one reason I didn't bother with utf-8 is that the much more fundamental issues are simply in plain old 7-bit US-ASCII. That said, if the only issue is that you want to encode regular utf-8 in a coherent way (and ignore the case issues), then we could probably do that part fairly easily with a "convert_to_internal()" and "convert_to_filename()" thing that acts very much like the CRLF conversion (except on filenames, not data). And yes, it's probably worth doing, since we'd need that for fuller case support anyway. It's just a fair amount of churn - not fundamentally _hard_, but not trivial either. And it needs a _lot_ of care, and a fair amount of testing that is probably hard to do on sane filesystems (ie the case where the filesystem actually _changes_ the name is going to be hard to test on anything sane). Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html