Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jan 16, 2008, at 11:46 AM, Jakub Narebski wrote:

More like, Mac OS X has standardized on Unicode and the rest of the
world hasn't caught up yet. Git is the only tool I've ever heard of that
has a problem with OS X using Unicode.

No. That's not at all the problem. Mac OS X insists on storing _another_
encoding of your filename.  Both are UTF-8.  Both encode the _same_
string.  Yet they are different, bytewise.  For no good reason.

To be more exact encoding used to _create_ file differs from encoding
returned when _reading directory_...

Stop spreading FUD. Git can handle Unicode just fine. In fact, Git does not _care_ how the filename is encoded, it _respects_ the user's choice,
not only of the encoding _type_, but the _encoding_, too.

...which means that sequence of bytes differ. And Git by design is
(both for filenames and for blob contents) encoding agnostic.

HFS+ is just _stupid_. And unfortunately Git doesn't support stupid
filesystems (e.g. case insensitive filesystems) well.

There's two different ways to do filesystem encodings. One is to have the fs simply not care about encoding, which is what the linux world seems to prefer. Sure, this is great in that what you create the file with is what you get back, but on the other hand, given an arbitrary non-ASCII file on disk, you have absolutely no idea what the encoding should be and you can't display it without making assumptions (yes you can use heuristics, but you're still making assumptions). Filesystems like HFS+ that standardize the encoding, on the other hand, make it such that you always know what the encoding of a file should be, so you can always display and use the filename intelligently. It also means it plays much nicer in a non-ASCII world, since you don't have to worry about different normalizations of a given string referring to different files (it's one thing to be case-sensitive, but claiming that "föo" and "föo" are different files just because one uses a composed character and the other doesn't is extremely user- unfriendly). On the other hand, what you create the file with may not be what you read back later, since the name has been standardized. It's hard to say one is better than the other, they're just different ways of doing it. However, I have noticed that everybody who's voiced an opinion on this list in favor of the encoding-agnostic approach seem to be unwilling to accept that any other approach might have validity, to the extent of calling an OS/filesystem that does things different stupid or insane. This strikes me as extremely elitist and risks alienating what I expect to be a fast-growing group of users (i.e. OS X users).

I'm willing to give Linus a free pass on calling other OS's stupid and insane, as I don't think Linux would exist as it does today without his strong opinions, but I don't think this should give carte blanche to the rest of the community for this inflammatory behavior.

I should note that I'm only taking the time to discuss this because, despite the fact that I'm new to git, I really like it and I want it to work better. And one area that it has a problem with is the de- facto filesystem on my OS of choice. However, attempts to discuss the problem invariable end up with multiple people calling my OS stupid and insane simply because it differs in a particular design decision. This is not a good way to build a community or to build a better product, and I hope it can be improved.

-Kevin Ballard

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com


<<attachment: smime.p7s>>


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux