Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jan 16, 2008, at 11:51 PM, Martin Langhoff wrote:

On Jan 17, 2008 5:30 PM, Kevin Ballard <kevin@xxxxxx> wrote:
Those of us who grew up on a case-insensitive filesystem don't find
there to be any problem with it. I can count on one hand the number of

I guess you haven't used unix tools much. The ever-popular HEAD perl
utility (which does an HTTP HEAD against a URL), when installed,
silently overwrites the head shell utility, which is used for all
sorts of things, some even in startup scripts. Ooops! I've been hit by
this more than once - and if you google for it, it hurt a lot of
people.

I can imagine. However, I've never been hit by such a situation. This doesn't mean a case-insensitive filesystem is a problem per se, it means interactions between a case-insensitive and a case-sensitive filesystem can be a problem. That doesn't mean either way is "correct" it just means both don't work well together.

I like ice cream, and I like steak, but I sure don't think a mixture of steak and ice cream would go well together. Do you?

That's only true if you don't know what type of filesystem you're on.
And, in the vast majority of cases (in fact, a content tracker is the
only exception I can think of), it doesn't matter. If the user said

Hmmm. Many important tools - that I wouldn't want to ever fail! - have
similar needs to git. Backup/restore and file replication tools for
example.

Both of which would be replicating the directory contents, not a listing of files specified by the user. If, as a user, I were to say "please replicate file FOO" and the file was really called "foo", I wouldn't be in the least surprised to see the tool take me at my word and produce a file called "FOO" with the contents of "foo". But in general, things like this operate on the filesystem, not on the user args.

This is why case-insensitivity is so hard: you have a very real
"aliasing"
on the filesystem level, where all those really *different*
pathnames end
up being the same thing.

I don't see that as being a problem. Think of it, if you will, as if
every single file simply had an implicit hardlink for every possible
case or normalization variant. The whole point of the filename is that

Ok - but how do you track the directory then (in git's terms, the
tree). There's no way to tell what the user wants. Does the user want
a copy of the file with different capitalization, or is the OS playing
games?

If I say "track FOO", I probably mean it. So go ahead and track "FOO", even if you end up tracking the contents of file "foo". I certainly won't blame the tool for doing what I told it.

it is meta-information, used as an identifier and not as actual
content, and thus it is perfectly fine for it to be a real string,
subject to interpretation,

I don't think you *actually* want it subject to interpretation.

Sure I do. I find it very convenient, for example, to say "cd documents/school" when I really want to go to "Documents/School". Similarly, if I'm trying to reference gitweb/tests/Märchen, I'm quite happy to not have to figure out what normalization the filename is using and attempt to replicate that (especially as I have no idea which normalization my input mechanism uses - unlike Linus, I don't have a key dedicated to ä, and even if I did I wouldn't necessarily expect it to use precomposed vs decomposed). I can't think of a single reason why I'd want to be able to have 2 different files named "Märchen" on my disk. On the other hand, treating unicode normalization as significant can pose security risks - how am I to know that the file that is named "foo.txt" is really the same file "foo.txt" that I last saw? Someone I know on IRC sent me this image[1], which shows 6 files all apparently named "foo.txt" on a disk image. This is possible because on a case-sensitive HFS+ volume, the file system doesn't ignore ignorables when comparing filenames (it does on a case-insensitive HFS+ system), and so all of those filenames look identical up until you actually pipe their names through xxd and look at the byte sequence. When this sort of tomfoolery is possible, I simply cannot trust the names of any of my files anymore.

[1]: http://sailor月.com/imgs/ignorable.png

Again, as someone who grew up in a case-insensitive world, there's no
problems here. I wish I could tell you that it causes problems, I wish
I could agree with you, but I can't.

Probably because you have been surrounded by tools that have a lot of
extra code to cope with the case insensitive way of life, and learned
to not do things that are completely valid, just to avoid trouble.
Which is ok, but I don't think it makes the OS design decision

Extra code? I don't think so. The only reason I'd need extra code is if I were attempting to explicitly detect the "real" filename for a user-supplied argument, by scanning the directory contents until I found a file that was equivalent to the given argument. But there's no reason to do that. None of the code I've ever written, or any of the code I've ever seen, has had to do any extra work because it was on a case-insensitive filesystem. I contribute to a packaging system for the Mac called MacPorts, and I've never seen any patches on any of the 4000+ ports to handle case insensitivity (granted, I haven't looked at every port, but I've looked at a significant fraction). It's a complete non-issue.

The content of files is sacred. The filename is only there to provide a handle to locate the contents. I don't see any problem with expanding the equivalency scope of the filename to accept multiple encodings and cases. The only arguments I can see that have any validity at all are the ones that sound like "we use case-sensitive filesystems, and your case-insensitivity and normalization are causing problems with our tools! Conform to our world!". As I said above, this isn't a problem of case-insensitivity or normalization, it's a problem of interaction between two incompatible viewpoints. All I want to do is make git play nicer in an HFS+ world, and this would be far easier if you guys were willing to admit this is a problem that should be solved in the tool rather than a problem with the system.

-Kevin Ballard

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com


<<attachment: smime.p7s>>


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux