Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



El 16/1/2008, a las 16:43, Kevin Ballard escribió:

On Jan 16, 2008, at 10:34 AM, Johannes Schindelin wrote:

On Wed, 16 Jan 2008, Mark Junker wrote:

I have some files like "Lüftung.txt" in my repository. The strange thing is that I can pull / add / commit / push those files without problem but git-status always complains that thoes files are untraced (but not missing).

This is a known problem.  Unfortunately, noone has implemented a fix,
although if you're serious about it, I can point you to threads where it
has been hinted how to solve the issue.

FWIW the issue is that Mac OS X decides that it knows better how to encode
your filename than you could yourself.

More like, Mac OS X has standardized on Unicode and the rest of the world hasn't caught up yet. Git is the only tool I've ever heard of that has a problem with OS X using Unicode.

As far as I know, Subversion has basically exactly the same problem, and any time you consume/produce files on Mac OS X that are be consumed/produced on other platforms you will run into this kind of issue, with any software.

Tell Mac OS X to write a file with "ó" in the file name ("\xc3\xb3" in UTF-8), and it will "normalize" it prior to writing by converting it into a decomposed form (that is, ASCII "o" followed by "\xcc\x81", or "combining acute accent"). So they're both valid Unicode, both valid UTF-8, and they encode exactly the same characters but the byte stream is different.

If you only work on Mac OS X then this will never be a problem because all the files you create and therefore all the files you add to your Git repository will have their names in decomposed UTF-8. But when you start cloning repositories containing files added on other systems, systems which might use precomposed rather than decomposed UTF-8 then you'll run into exactly this kind of problem. The git.git repo has one such file itself (gitweb/test/Märchen, if I remember correctly, which Git reports as untracked).

Now, Mac OS X's behaviour is not entirely "insane" as some would claim; there is indeed a rationale behind it even if you don't agree with it, but it *does* produce some unfortunate teething problems for people wanting to use Mac OS X in a cross-platform environment.

Here are some Apple docs on the subject:

http://developer.apple.com/qa/qa2001/qa1173.html

http://developer.apple.com/qa/qa2001/qa1235.html

I personally wish that UTF-8 didn't allow different normalization forms; then this kind of problem wouldn't arise. But it has arisen and we have to live with it. Some workarounds have been proposed for Git, but I haven't seen any convincing proposals yet.

Cheers,
Wincent



-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux