On Wed, Jan 16, 2008 at 03:39:36PM -0500, Kevin Ballard wrote: > On Jan 16, 2008, at 11:46 AM, Jakub Narebski wrote: > > > > >HFS+ is just _stupid_. And unfortunately Git doesn't support stupid > >filesystems (e.g. case insensitive filesystems) well. > > There's two different ways to do filesystem encodings. One is to have > the fs simply not care about encoding, which is what the linux world > seems to prefer. There is no technical reason for *kernel* to care about file name encoding. It is something that can be and should be dealt with in the user space (except some special cases like smbfs). > Sure, this is great in that what you create the file > with is what you get back, And also because a user space program can deal with it much more gracefully... > but on the other hand, given an arbitrary > non-ASCII file on disk, you have absolutely no idea what the encoding > should be and you can't display it without making assumptions (yes you > can use heuristics, but you're still making assumptions). Wrong. If you have a policy that all file names are stored in UTF-8 encoding then there is no problem here. It should not be a kernel problem to care about encoding, besides you cannot fully solve it in the kernel space anyway... > Filesystems > like HFS+ that standardize the encoding, Yeah, right... Like Microsoft likes to "standardize" everything, which in practice means forcing on others something fundamentally broken and that does not follow any existing standard precisely: === IMPORTANT: The terms used in this Q&A, decomposed and precomposed, roughly correspond to Unicode Normal Forms D and C, respectively. However, most volume formats do not follow the exact specification for these normal forms. === http://developer.apple.com/qa/qa2001/qa1173.html Not to mention that the use of decomposed Unicode as the standard is outright silly -- no sane person writes in "decomposed" Unicode... > on the other hand, make it > such that you always know what the encoding of a file should be, so > you can always display and use the filename intelligently. Somehow I have no problem with displaying non-ASCII names on Linux. I can see both Unicode Normal Forms C and D encoded symbols without any problem, though the kernel is completely unaware about them. > It also > means it plays much nicer in a non-ASCII world, since you don't have > to worry about different normalizations of a given string referring to > different files (it's one thing to be case-sensitive, but claiming > that "föo" and "föo" are different files As you typed them, they both are exactly the same, and both of them are in the Normal Forms C (which Mac calls as precomposed). So why do you use one encoding in your writings and the other in your file names? > just because one uses a > composed character and the other doesn't is extremely user- > unfriendly). On the other hand, what you create the file with may not > be what you read back later, since the name has been standardized. > It's hard to say one is better than the other, they're just different > ways of doing it. However, I have noticed that everybody who's voiced > an opinion on this list in favor of the encoding-agnostic approach > seem to be unwilling to accept that any other approach might have > validity, to the extent of calling an OS/filesystem that does things > different stupid or insane. This strikes me as extremely elitist and > risks alienating what I expect to be a fast-growing group of users > (i.e. OS X users). I am sure everyone here is scared to death... I mean we have used to hear such threats from some MS salespeople, but from a Mac guy? It is really scare.... Wake up, and stop shooting this nonsense at us. If you have technical reasons why your solution is better, let us know. So far, you do not sound very convincing here. Why do think that the issue of encoding can not be dealt with in the user space? Why does Mac OS X uses so-called decomposed Unicode, which even does not follow any standard precisely? Why does Mac OS X chose to decompose characters while it does not solve any real issue? > And one area that it has a problem with is the de- > facto filesystem on my OS of choice. I suppose it would be much better a subject for discussion... At least, it would be more likely to result in that Git working better on your OS. > However, attempts to discuss the > problem invariable end up with multiple people calling my OS stupid > and insane simply because it differs in a particular design decision. First, no one called Mac OS X insane, but case insensitive filesystems, and there are good reasons to think so, because no one has demonstrated so far any advantage of that approach, but disadvantages are quite obvious to anyone -- comparison of a stored file list with readdir() is much more problematic, and you cannot say that you have solved the problem with encoding if you force other people to *duplicate* some logic that Mac OS X does in its kernel just to get things working... So, no one thinks it is insane because it is different, but because it requires much more efforts to do the same thing -- compare two file lists, and this operation is important for Git to work properly... Dmitry - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html