Re: [PATCH V4] git on Mac OS and precomposed unicode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22.01.12 11:03, Nguyen Thai Ngoc Duy wrote:
> 2012/1/22 Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx>:
>>>> In order to prevent that ever a file name in decomposed unicode is
>>>> entering the index, a "brute force" attempt is taken: all arguments into
>>>> git (argv[1]..argv[n]) are converted into precomposed unicode.
> 
> Forgot one more thing. We have case-insensitive support in place
> already, we can hook precomposed form conversion there before
> comparing. In other words we just need to support
> {pre,de}composed-insensitive string compare.
> 
>Forgot one more thing. We have case-insensitive support in place
>already, we can hook precomposed form conversion there before
>comparing. In other words we just need to support
>{pre,de}composed-insensitive string compare.
>-- Duy 

Yes, I like that idea.
After doing some experiments with precomposed and decomposed file names,
the motvation for the fix, or so to say the root cause, changed.

The Mac OS X file system on VFAT has a kind of schizophrenia:
- unicode file names are written as precomposed onto the disk
  and Linux/Windows see them as precomposed.
- readdir() returns always decomposed.
- open()/fopen() stat() lstat() works for both pre- and decomposed

Therefore a repository on VFAT under Mac OS X looks as follows:

- file names on disk are precomposed
- file names in the index are decomposed

As long as only Mac OS uses that device, there is no problem.

When we now move the e.g. USB stick using VFAT to Linux
the "decomposed" in the index seem to be deleted and the
precomposed on disk are untracked.
A complete desaster.

To keep the file names in the index and how they are stored
on disk the same, we can can set "core.precomposedunicode" = true.
(Side note: should we rename it to "i18n.precomposedunicode" ?)

Now we keep the index and file names on disk the same, and can move the
USB stick between Linux, Mac OS X, or Windows (since msysGit-1.7.10-
now called Git for Windows, or git under cygwin 1.7)

The same problem occurs when Mac OS mounts a network share from linux
using SAMBA:
readdir() returns decomposed, creat() stores file names in precomposed
on the remote linux machine.

If we put a file name with decomposed unicode on the linux machine,
it will be listed as decomposed by readdir() on the Mac OS side.
Trying to access this file failes, because Mac OS tries to open
the precomposed version, and that does not exist.


Another thing:
There are some reasons to avoid decomposed unicode in Linux and Windows:
Many user space programs don't handle decomposed unicode very well.
When e.g. an "û" should be displayed, the output looks like "u^" in many
programs.
And if we need more motivations: decomposed unicode is hard to enter on
the keyboard.



Then I went back to my original problem 
(versioning the "Documents" folder on my Linux $HOME under git
and access it from Windows and Mac OS using SAMBA, or cloning it
to a laptop...)

Knowing that
a) Mac OS X handles precomposed and decomposed the same in open()...
b) The user space program on Mac OS handle precomposed just fine
c) Many user space programs don't presentate decomposed as it should be
d) It is hard to enter decomposed unicode at the keyboard
e) and therefore decomposed unicode is seldom used on Linux
f) Mac OS using SAMBA puts file names in precomposed unicode on the 
   remote side

Do we have a motivation for pushing a solution that ignores 
the unicode composition ?

I'll send a V5 version with hopefully a better motivation


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]