Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 22, 2008 at 08:34:27AM -0500, Theodore Tso wrote:
> 	* Documenting HFS+'s current pseudo-normalization algorithm.
> 	  It's not enough to say that you need to decompose all
> 	  Unicode characters, since you've claimed that HFS+ doesn't
> 	  decompose Unicode characters after some magic date,
> 	  presumably roughly 9 years ago.

I did some research on this point, since if we really are going to be
compatible with MacOS X's crappy HFS+ system, we need to know what the
decomposition algorithm actually is.  Turns out, there are *two* of
them.  Kevin didn't know what he was talking about.  In fact,
different versions of Mac OS X use different normalization algorithms.

Mac OS X 8.1 through 10.2.x used decompositions based on Unicode 2.1.
Mac OS X 10.3 and later use decompositions based on Unicode 3.2.[1]

As I correctly predicted, Apple is changing their normalization
algorithm in different versions of Mac OS X.  It is not static, which
meands there will be compatibility problems when moving hard drives
between Mac OS X versions.  I don't know if they try to fix this in
their fsck or not, when upgrading from 10.2 to 10.3, but if not,
certain files could disappear as part of the Mac OS X upgrade.  Fun
fun fun.

And clearly Kevin didn't read the tech note very carefully, since it
clearly admits why they did it.  The Mac OS X developers were being
cheasy with how they implemented their HFS B-tree algorithms, and took
the cheap, easy way out.  So yeah, "crappy" is the only word that can
be used for what Mac OS X perpetuated on the world.  Because of that,
a quick Google search shows it causes problems all over the stack, for
many different programs beyond just git, including limewire and
gnutella[2][3], Slim[4], and no doubt others.

[1] http://developer.apple.com/technotes/tn/tn1150.html#UnicodeSubtleties
[2] http://lists.limewire.org/pipermail/gui-dev/2003-January/001110.html
[3] http://osdir.com/ml/network.gnutella.limewire.core.devel/2003-01/msg00000.html
[4] http://forums.slimdevices.com/showthread.php?t=40582

In any case, it seems pretty clear that by now everyone except Kevin
has realized that HFS+ is crappy and causes Internet-wide
interoperability problems.  So I'll justify sending this note by
pointing out the specific table of Mac OS's filesystem corruption
algorithm can be found here:

	  http://developer.apple.com/technotes/tn/tn1150table.html

I'd also recommend that the Mac OS X code try to either figure out
whether it is running on an HFS+ partition, or let the HFS+ workaround
code be something that can be controlled via .git/config.  It
shouldn't be on unconditionally even on a Mac OS X system, since if
the git repository is on a ZFS or NFS filesystem, there's no reason to
pay the overhead of working around the HFS+ bugs.

						- Ted
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux