Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16. jan.. 2008, at 17.32, Johannes Schindelin wrote:

FWIW the issue is that Mac OS X decides that it knows better how to
encode your filename than you could yourself.

More like, Mac OS X has standardized on Unicode and the rest of the
world hasn't caught up yet. Git is the only tool I've ever heard of that
has a problem with OS X using Unicode.

No. That's not at all the problem. Mac OS X insists on storing _another_
encoding of your filename.  Both are UTF-8.  Both encode the _same_
string.  Yet they are different, bytewise.  For no good reason.

Stop spreading FUD. Git can handle Unicode just fine. In fact, Git does not _care_ how the filename is encoded, it _respects_ the user's choice,
not only of the encoding _type_, but the _encoding_, too.

"FUD" is a bit strong, don't you think? HFS+ is the way it is and it would be nice if Git could deal with it.

The problem is that HFS+ normalizes filenames to avoid multiple files that appear to have the same name (eg "M<A WITH UMLAUT>rchen" vs "Ma<UMLAUT MODIFIER>rchen", in gitweb/test). This is sort of like case sensitivity, but filenames are normalized when a file is _created_. Git, not unreasonably, expects a file to keep the name it was created with.

As far as I can tell, as long as you add all your internationally becharactered files to git from an HFS+ file system using a gui or command-line completion, you'll be okay; trouble starts when you check in a file with the composed form of a character, by typing the name on the command line (I'm not sure about this one) or committing on another OS. Git will store the filename in composed form, but the Mac's filesystem will decompose the filename when you check the file out.

The result looks like this:

vredefort:[git]% git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	gitweb/test/Märchen
nothing added to commit but untracked files present (use "git add" to track)

(this is directly after checking out git.git @ v1.5.4-rc3)

There are two things to note here. One is that Git thinks that there is a new file called "gitweb/test/Märchen" (decomposed) when it's "really" just the same "gitweb/test/Märchen" (precomposed) that's in the repository. The other is that git _thinks_ that the "gitweb/test/ Märchen" (precomposed) it's expecting is still there, because the filesystem, when asked for "gitweb/test/Märchen" in any form will return the file "gitweb/test/Märchen" (decomposed).

Trying to check out the "next" branch at this point is a pain since next's "Märchen" would overwrite the untracked "Märchen".

I can't provide links to any previous discussions about this, but here's Apple's Technical Q&A on the subject:

http://developer.apple.com/qa/qa2001/qa1235.html

Finding a sane way of allowing git to handle this behaviour is left as an exercise for the reader.

Eyvind Bernhardsen

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux