On Tue, Jan 22, 2008 at 08:34:27AM -0500, Theodore Tso wrote: > * Documenting HFS+'s current pseudo-normalization algorithm. > It's not enough to say that you need to decompose all > Unicode characters, since you've claimed that HFS+ doesn't > decompose Unicode characters after some magic date, > presumably roughly 9 years ago. I did some research on this point, since if we really are going to be compatible with MacOS X's crappy HFS+ system, we need to know what the decomposition algorithm actually is. Turns out, there are *two* of them. Kevin didn't know what he was talking about. In fact, different versions of Mac OS X use different normalization algorithms. Mac OS X 8.1 through 10.2.x used decompositions based on Unicode 2.1. Mac OS X 10.3 and later use decompositions based on Unicode 3.2.[1] As I correctly predicted, Apple is changing their normalization algorithm in different versions of Mac OS X. It is not static, which meands there will be compatibility problems when moving hard drives between Mac OS X versions. I don't know if they try to fix this in their fsck or not, when upgrading from 10.2 to 10.3, but if not, certain files could disappear as part of the Mac OS X upgrade. Fun fun fun. And clearly Kevin didn't read the tech note very carefully, since it clearly admits why they did it. The Mac OS X developers were being cheasy with how they implemented their HFS B-tree algorithms, and took the cheap, easy way out. So yeah, "crappy" is the only word that can be used for what Mac OS X perpetuated on the world. Because of that, a quick Google search shows it causes problems all over the stack, for many different programs beyond just git, including limewire and gnutella[2][3], Slim[4], and no doubt others. [1] http://developer.apple.com/technotes/tn/tn1150.html#UnicodeSubtleties [2] http://lists.limewire.org/pipermail/gui-dev/2003-January/001110.html [3] http://osdir.com/ml/network.gnutella.limewire.core.devel/2003-01/msg00000.html [4] http://forums.slimdevices.com/showthread.php?t=40582 In any case, it seems pretty clear that by now everyone except Kevin has realized that HFS+ is crappy and causes Internet-wide interoperability problems. So I'll justify sending this note by pointing out the specific table of Mac OS's filesystem corruption algorithm can be found here: http://developer.apple.com/technotes/tn/tn1150table.html I'd also recommend that the Mac OS X code try to either figure out whether it is running on an HFS+ partition, or let the HFS+ workaround code be something that can be controlled via .git/config. It shouldn't be on unconditionally even on a Mac OS X system, since if the git repository is on a ZFS or NFS filesystem, there's no reason to pay the overhead of working around the HFS+ bugs. - Ted - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html