Please don't cull the list when replying. Reply-to-all is the standard on git@vger. On 10/01/2011 08:57 AM, Albert Zeyer wrote: > On Sat, Oct 1, 2011 at 3:39 PM, Andreas Ericsson<ae@xxxxxx> wrote: >> On 10/01/2011 07:44 AM, Albert Zeyer wrote: >>> Hi, >>> >>> There are problems on MacOSX with different UTF8 encodings of >>> filenames. A unicode string has multiple ways to be represented as >>> UTF8 and Git treats them as different filenames. This is the actual >>> bug. It should treat them all as the same filename. In some cases (as >>> on MacOSX), the underlying operating system may use a normalized UTF8 >>> representation in some sort, i.e. change the actual UTF8 filename >>> representation. >>> >>> Similar problems also exists in SVN, for example. This was reported >>> [here](http://subversion.tigris.org/issues/show_bug.cgi?id=2464). >>> There you can find also lengthy discussions about the topic. And also >>> [here](http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames). >>> >>> This was already reported for Git earlier and there is also a patch >>> for Git [here](http://lists-archives.org/git/719832-git-mac-os-x-and-german-special-characters.html). >>> >>> I wonder about the state of this. This hasn't been applied yet. Why? >>> >> >> Because the patch didn't address repositories carrying files with >> more than one possible representation of the filename and that >> could have lead to silent loss of data for unsuspecting users. >> >> The real solution to your problem is, unfortunately, to either use >> a different and more competent filesystem, or to avoid triggering >> the bugs in the one you're currently using. > > Well, I think it is a bug in Git itself that it treats different UTF8 > representations of the same filename as different filenames. It > shouldn't have allowed such in the first place. > > But I see your point. I guess I will work myself on a patch here or > extend that one. The trouble is that they may represent two different files on a different filesystem. The Linux kernel repo has plenty of files that exist with both uppercase and lowercase characters, like so: SOMEFILE_driver.c somefile_driver.c This is perfectly valid on all sensible and case-sensitive filesystems, but breaks horribly on HFS. There are other, far more "interesting" cases when you involve special chars such as the german umlaut, or the swedish åäö characters. -- Andreas Ericsson andreas.ericsson@xxxxxx OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html