Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 23, 2008 at 08:16:33AM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 23 Jan 2008, Theodore Tso wrote:
> > 
> > So this demonstrates that on my MacOS 10.4.11 system, on NFS, MacOS is
> > doing no normalization, as it is creating two files.  On HFS+, MacOS
> > is mapping both filenames to the same decomposed name.
> 
> Well, it demonstrates that (a) the OS and (b) _perl_ don't mangle 
> filenames on non-HFS+ filesystems.

Well "touch" actually since that was what was actually creating the
files; I only used perl because it was easist way to gaurantee exactly
how the filenames would be generated.

> The problem is that since most native applications *expect* that name 
> mangling, they'll probably do name mangling of their own (internally) just 
> to compare the names!
> 
> So I would not be surprised if the globbing libraries, for example, will 
> do NFD-mangling in order to glob "correctly", so even programs ported from 
> real Unix might end up getting pathnames subtly changed into NFD as part 
> of some hot library-on-library action with UTF hackery inside.

It's worse than that.  You can specify at format time whether or not
HFS+ does case-sensitivity or not, and of course, there is UFS, which
I expect does no Unicode normalization at all, much like NFS.  I
suspect what you've pointed out is why certain MacOS programs break
horribly when run on non-HFS+ filesystems, though.  And if that is the
case, then those same programs might not be reliable if the user's
home directory is stored on NFS --- like they would be in an
enteprise/corproate environment, if Apple ever wants to have any hope
of penetrating that market.

Because of this, git code won't be able to just check for HFS+; it
will probably have to do a run-time test to see whether or not the
filesystem is doing case-folding or not, since that can be turned on
or off on a per-filesystem basis.  Also unknown, and which should be
tested, is whether turning off case-folding also turns off Unicode
normalization.  It may be that they did this so that HFS+ could be UFS
compatible, since Darwin *must* be built on a UFS filesystem,
reflecting its Mach/BSD heritage.  (I ran across this while doing my
web research; apparently HFS+ has been causing Apple headaches
internally.  Heh.  :-)

>Things like the finder etc, which must be very aware of the fact that
>filenames get corrupted, would presumably internally always convert
>everything they get into NFD in order to compare names from different
>sources. And as part of that, programs may well corrupt the name before
>they then use it to create a pathname.

Well, hopefully not everyone inside Apple's OS groups are total
morons, and actually use a utf8_str_equiv() routine instead of
strcmp() to do their Unicode comparisons.  But then again, maybe
not...

> The fact that your perl program works under NFS, but creates NFD on a VFAT 
> volume, does imply that they probably used at least some of the same 
> routines they use in HFS+ for VFAT. Not entirely surprising: doing case 
> insensitive stuff with Unicode is nasty code, so why not share it (even if 
> it's then incorrect for FAT)..
> 
> Piece of crap it is, though. Apple has painted themselves into a nasty 
> corner there.

No kidding!!

							- Ted
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux