Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 16 Jan 2008, Kevin Ballard wrote:
> 
> I believe it exists because HFS+ was created at a time when the Mac was moving
> from a multi-encoding world (which was a nightmare) to a Unicode world and
> they wanted to remove ambiguity in filenames. But I wasn't around when they
> made this decision so this is just a guess.

I do agree. And I think starting out case-insensitive (something they must 
really hate by now) also made it less of an issue. When you're 
case-insensitive, the issues with any UTF-8 normalization are simply 
swamped by all the issues of case, so you probably don't even think about 
it very much.

The big problem with any name rewriting is that I can open file 'xyz', and 
I literally have a very hard time knowing whether that file I know I 
opened and created has anything to do with the file 'Xyz' that I see when 
I do a readdir().

Are they the same? Maybe. But it's literally hard to tell on OS X. I can 
do an fstat() on my file descriptor and on the directory entry, and if 
they get the same d_ino they *probably are the same entry, but even then 
it actually could have been a hardlink (and my 'xyz' is really *another* 
name for it entirely, and the filesystem is actually case-sensitive and 
'Xyz' was a *different* name that somebody else did!).

See? If you're creating a content tracker, these kinds of issues are not 
"idle chatter". It's really *really* important. Was that file the one I 
was told to track? Or was it a temporary file that was just hardlinked? 

This is why case-insensitivity is so hard: you have a very real "aliasing" 
on the filesystem level, where all those really *different* pathnames end 
up being the same thing.

And all the same issues show up with utf-8 rewriting, so if you normalize 
utf-8 names, you actually end up having almost all the same problems that 
a case-insensitive filesystem has. They're just much rarer in practice, so 
you just won't hit them as often - but when you do, they are equally 
painful!

(In fact, they can be a whole lot *more* painful, because now they are 
really rare, and really confusing when they happen!)

But if you come from a case-insensitive background, all the UTF-8 
rewriting really looks like such a small problem compared to all the 
horrid problems that you had with different locales and cases, so I 
suspect they didn't even realize what a big mistake they did!

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux