Re: [RFC/PATCH v5] git on Mac OS and precomposed unicode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Torsten Bögershausen <tboegi@xxxxxx> writes:

> The problem:

As to the log message, I've rewritten it a bit by reordering
paragraphs and cutting redundant sentences. For exact wording nits,
please check 'pu' when I finish today's integration cycle and push
the results out, but I'll justify the reasoning behind my rewrite
here.

> Mac OS X may manipulate file names containing unicode on file systems
> HFS+, VFAT or SAMBA.
>
> When a file using unicode code points outside ASCII is created on a HFS+ drive,
> the file name is converted into decomposed unicode and written to disk.
> No conversion is done if the file name is already decomposed unicode.

I do not think it matters very much if it is written decomposed
(HFS+) or precomposed (VFAT). The important glitch that affects us
is that readdir(3) on Mac OS X gives the readers decomposed form,
unless over NFS, and the important saving grace that your patch
exploits is that stat/open/etc. will take either form and name the
same file.  So I tried to minimize the description on how it is
written to disk in my rewrite.

> The unicode decomposition creates some problems:
> - "git add" needs the decomposed form on the command line,
>   so that the file name is picked up when readdir() is called
>   to build a list of files on disk.
> - The decomposed form is not (easily) available on the keyboard.
>   To work around this, a wildcard could be used in "git add":
>   Instead of using "git add Märchen.txt" the user needs to enter
>   "git add M*rchen", "git add M<TAB>" or "git add *".
> - "git log", "git mv" and all other commands needs the decomposed form
>   to find the file name which is stored as decomposed in the index.
> - The file names are stored in decomposed unicode in the index, but
>   precomposed on disk.
>     This makes it impossible to use this repository under e.g.
>     Linux or Windows:
>     All files appear to be deleted in the decomposed form and
>     untracked in the precomposed form.

I do not think "workaround" deserves a mention; presense of mixture
of precomposed and decomposed forms is the root cause of the
problem, and even if we prefer to use precomposed form (for
interoperability if nothing else), the "workaround" to force more
decomposed input will make the problem worse, not better.

> Knowing that Mac OS X writes file names as precomposed to disk,

Again, how it writes is not important; readdir(3) giving us what is
different from what we used for creat(2) is.

> The argv[] conversion allows to use the TAB filename completion done by the
> shell on command line.

Yes, this is exactly why "workaround" is not a workaround, but is
yet another problem.

> When creating a new git repository with "git init" or "git clone",
> "core.precomposedunicode" will be set "false".
>
> The user needs to activate this feature manually.
> She typically sets core.precomposedunicode to "true" on HFS and VFAT,
> or file systems mounted via SAMBA onto a Linux box.

This we might want to change it in a couple of cycles after this
feature hits 'next' and people gain experience with it.

I think the reason to choose the safer "false" default is to keep
the behaviours between an old repository on Mac OS X and a new
repository cloned from it also on Mac OS X the same, but if we can
detect that the filesystem is broken, and have a code to work around
the breakage, I think the longer term direction would be to set it
to ensure that the resulting history records paths consistently in
precomposed form (another choice might be to normalize to decomposed
form, but my understanding is that it would not help anybody, as
nobody other than Mac OS X uses it).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]