On Wed, 16 Jan 2008, Kevin Ballard wrote: > On Jan 16, 2008, at 11:46 AM, Jakub Narebski wrote: >>>> More like, Mac OS X has standardized on Unicode and the rest of the >>>> world hasn't caught up yet. Git is the only tool I've ever heard >>>> which has a problem with OS X using Unicode. >>> >>> No. That's not at all the problem. Mac OS X insists on storing >>> _another_ encoding of your filename. Both are UTF-8. Both encode >>> the _same_ string. Yet they are different, bytewise. For no good >>> reason. >> >> To be more exact encoding used to _create_ file differs from encoding >> returned when _reading directory_... >> >>> Stop spreading FUD. Git can handle Unicode just fine. In fact, >>> Git does not _care_ how the filename is encoded, it _respects_ the >>> user's choice, not only of the encoding _type_, but the _encoding_, >>> too. >> >> ...which means that sequence of bytes differ. And Git by design is >> (both for filenames and for blob contents) encoding agnostic. >> >> HFS+ is just _stupid_. And unfortunately Git doesn't support stupid >> filesystems (e.g. case insensitive filesystems) well. By the way, calling HFS+ stupid, or rather calling at least two different normalizations of UTF-8 (two different encodings) used for writing and reading filenames stupid is wrong _for me_. I have quoted Linus here, when I think I should use other description. > There's two different ways to do filesystem encodings. One is to have > the fs simply not care about encoding, which is what the linux world > seems to prefer. Sure, this is great in that what you create the file > with is what you get back, but on the other hand, given an arbitrary > non-ASCII file on disk, you have absolutely no idea what the encoding > should be and you can't display it without making assumptions (yes you > can use heuristics, but you're still making assumptions). Filesystems > like HFS+ that standardize the encoding, on the other hand, make it > such that you always know what the encoding of a file should be, so > you can always display and use the filename intelligently. It also > means it plays much nicer in a non-ASCII world, since you don't have > to worry about different normalizations of a given string referring to > different files (it's one thing to be case-sensitive, but claiming > that "föo" and "föo" are different files just because one uses a > composed character and the other doesn't is extremely user- > unfriendly). For me it looks like a layering violation... but my knowledge about filesystem is cluse to nil. IMHO it is VFS and libc which should do the translating. > On the other hand, what you create the file with may not > be what you read back later, since the name has been standardized. > It's hard to say one is better than the other, they're just different > ways of doing it. But using one encoding to create file, and another when reding filenames is strange. It is IMHO better to simply refuse creating filenames which are outside chosen encoding / normalization. But having different encodings used for reading and writing on the level of filesystem access (not on level of UI) is strange. > However, I have noticed that everybody who's voiced > an opinion on this list in favor of the encoding-agnostic approach > seem to be unwilling to accept that any other approach might have > validity, to the extent of calling an OS/filesystem that does things > different stupid or insane. This strikes me as extremely elitist and > risks alienating what I expect to be a fast-growing group of users > (i.e. OS X users). First, it is Git philosophy and very core of design to be encoding agnostic (to be "content tracker"). Second, using the same sequence of bytes on filesystem, in the index, and in 'tree' objects ensures good performance... this is something to think about if you want to add patches which would deal with HFS+ API/UI quirks. [cut] -- Jakub Narebski Poland - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html