On Jan 16, 2008, at 11:08 PM, Linus Torvalds wrote:
On Wed, 16 Jan 2008, Kevin Ballard wrote:I believe it exists because HFS+ was created at a time when the Mac was moving from a multi-encoding world (which was a nightmare) to a Unicode world and they wanted to remove ambiguity in filenames. But I wasn't around when theymade this decision so this is just a guess.I do agree. And I think starting out case-insensitive (something they mustreally hate by now) also made it less of an issue. When you're case-insensitive, the issues with any UTF-8 normalization are simplyswamped by all the issues of case, so you probably don't even think aboutit very much.
Those of us who grew up on a case-insensitive filesystem don't find there to be any problem with it. I can count on one hand the number of times I've run into a problem caused by a case-insensitive filesystem. That number is 1. And that 1 time is when git screwed up trying to track CS4536 and cs4536 in the same directory (see earlier thread).
The big problem with any name rewriting is that I can open file 'xyz', andI literally have a very hard time knowing whether that file I know Iopened and created has anything to do with the file 'Xyz' that I see whenI do a readdir().
That's only true if you don't know what type of filesystem you're on. And, in the vast majority of cases (in fact, a content tracker is the only exception I can think of), it doesn't matter. If the user said 'xyz' and you can stat() it, great, that's what the user wanted! Just because it's really called 'Xyz' on the filesystem doesn't make any difference.
Are they the same? Maybe. But it's literally hard to tell on OS X. I cando an fstat() on my file descriptor and on the directory entry, and ifthey get the same d_ino they *probably are the same entry, but even then it actually could have been a hardlink (and my 'xyz' is really *another* name for it entirely, and the filesystem is actually case-sensitive and'Xyz' was a *different* name that somebody else did!).See? If you're creating a content tracker, these kinds of issues are not "idle chatter". It's really *really* important. Was that file the one I was told to track? Or was it a temporary file that was just hardlinked?
But git is a content tracker, so even if it's really a different hardlink that shouldn't matter, it's still referencing the same content. Go ahead and track whatever name the user specified originally, as long as it maps to a file on disk with the expected content you're set. If the file is really called 'foo' and I told git to track 'Foo', I'm perfectly happy with it continuing to think 'foo' is 'Foo' until I use 'git mv Foo foo'.
This is why case-insensitivity is so hard: you have a very real "aliasing" on the filesystem level, where all those really *different* pathnames endup being the same thing.
I don't see that as being a problem. Think of it, if you will, as if every single file simply had an implicit hardlink for every possible case or normalization variant. The whole point of the filename is that it is meta-information, used as an identifier and not as actual content, and thus it is perfectly fine for it to be a real string, subject to interpretation, rather than treated as a sacred binary blob like content is. The whole purpose of the name is to identify the inode in question, and case and normalization aren't particularly relevant here. As long as we can identify the file, we're happy.
And all the same issues show up with utf-8 rewriting, so if you normalize utf-8 names, you actually end up having almost all the same problems that a case-insensitive filesystem has. They're just much rarer in practice, soyou just won't hit them as often - but when you do, they are equally painful! (In fact, they can be a whole lot *more* painful, because now they are really rare, and really confusing when they happen!) But if you come from a case-insensitive background, all the UTF-8 rewriting really looks like such a small problem compared to all the horrid problems that you had with different locales and cases, so I suspect they didn't even realize what a big mistake they did!
Again, as someone who grew up in a case-insensitive world, there's no problems here. I wish I could tell you that it causes problems, I wish I could agree with you, but I can't.
-Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@xxxxxx http://www.tildesoft.com
<<attachment: smime.p7s>>