Torsten Bögershausen <tboegi@xxxxxx> writes: > The problem: As to the log message, I've rewritten it a bit by reordering paragraphs and cutting redundant sentences. For exact wording nits, please check 'pu' when I finish today's integration cycle and push the results out, but I'll justify the reasoning behind my rewrite here. > Mac OS X may manipulate file names containing unicode on file systems > HFS+, VFAT or SAMBA. > > When a file using unicode code points outside ASCII is created on a HFS+ drive, > the file name is converted into decomposed unicode and written to disk. > No conversion is done if the file name is already decomposed unicode. I do not think it matters very much if it is written decomposed (HFS+) or precomposed (VFAT). The important glitch that affects us is that readdir(3) on Mac OS X gives the readers decomposed form, unless over NFS, and the important saving grace that your patch exploits is that stat/open/etc. will take either form and name the same file. So I tried to minimize the description on how it is written to disk in my rewrite. > The unicode decomposition creates some problems: > - "git add" needs the decomposed form on the command line, > so that the file name is picked up when readdir() is called > to build a list of files on disk. > - The decomposed form is not (easily) available on the keyboard. > To work around this, a wildcard could be used in "git add": > Instead of using "git add Märchen.txt" the user needs to enter > "git add M*rchen", "git add M<TAB>" or "git add *". > - "git log", "git mv" and all other commands needs the decomposed form > to find the file name which is stored as decomposed in the index. > - The file names are stored in decomposed unicode in the index, but > precomposed on disk. > This makes it impossible to use this repository under e.g. > Linux or Windows: > All files appear to be deleted in the decomposed form and > untracked in the precomposed form. I do not think "workaround" deserves a mention; presense of mixture of precomposed and decomposed forms is the root cause of the problem, and even if we prefer to use precomposed form (for interoperability if nothing else), the "workaround" to force more decomposed input will make the problem worse, not better. > Knowing that Mac OS X writes file names as precomposed to disk, Again, how it writes is not important; readdir(3) giving us what is different from what we used for creat(2) is. > The argv[] conversion allows to use the TAB filename completion done by the > shell on command line. Yes, this is exactly why "workaround" is not a workaround, but is yet another problem. > When creating a new git repository with "git init" or "git clone", > "core.precomposedunicode" will be set "false". > > The user needs to activate this feature manually. > She typically sets core.precomposedunicode to "true" on HFS and VFAT, > or file systems mounted via SAMBA onto a Linux box. This we might want to change it in a couple of cycles after this feature hits 'next' and people gain experience with it. I think the reason to choose the safer "false" default is to keep the behaviours between an old repository on Mac OS X and a new repository cloned from it also on Mac OS X the same, but if we can detect that the filesystem is broken, and have a code to work around the breakage, I think the longer term direction would be to set it to ensure that the resulting history records paths consistently in precomposed form (another choice might be to normalize to decomposed form, but my understanding is that it would not help anybody, as nobody other than Mac OS X uses it). Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html