Re: Git, Mac OS X and German special characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please don't cull the list when replying. Reply-to-all is the
standard on git@vger.

On 10/01/2011 08:57 AM, Albert Zeyer wrote:
> On Sat, Oct 1, 2011 at 3:39 PM, Andreas Ericsson<ae@xxxxxx>  wrote:
>> On 10/01/2011 07:44 AM, Albert Zeyer wrote:
>>> Hi,
>>>
>>> There are problems on MacOSX with different UTF8 encodings of
>>> filenames. A unicode string has multiple ways to be represented as
>>> UTF8 and Git treats them as different filenames. This is the actual
>>> bug. It should treat them all as the same filename. In some cases (as
>>> on MacOSX), the underlying operating system may use a normalized UTF8
>>> representation in some sort, i.e. change the actual UTF8 filename
>>> representation.
>>>
>>> Similar problems also exists in SVN, for example. This was reported
>>> [here](http://subversion.tigris.org/issues/show_bug.cgi?id=2464).
>>> There you can find also lengthy discussions about the topic. And also
>>> [here](http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames).
>>>
>>> This was already reported for Git earlier and there is also a patch
>>> for Git [here](http://lists-archives.org/git/719832-git-mac-os-x-and-german-special-characters.html).
>>>
>>> I wonder about the state of this. This hasn't been applied yet. Why?
>>>
>>
>> Because the patch didn't address repositories carrying files with
>> more than one possible representation of the filename and that
>> could have lead to silent loss of data for unsuspecting users.
>>
>> The real solution to your problem is, unfortunately, to either use
>> a different and more competent filesystem, or to avoid triggering
>> the bugs in the one you're currently using.
> 
> Well, I think it is a bug in Git itself that it treats different UTF8
> representations of the same filename as different filenames. It
> shouldn't have allowed such in the first place.
> 
> But I see your point. I guess I will work myself on a patch here or
> extend that one.


The trouble is that they may represent two different files on a
different filesystem. The Linux kernel repo has plenty of files
that exist with both uppercase and lowercase characters, like so:
SOMEFILE_driver.c
somefile_driver.c

This is perfectly valid on all sensible and case-sensitive
filesystems, but breaks horribly on HFS. There are other, far more
"interesting" cases when you involve special chars such as the
german umlaut, or the swedish åäö characters.

-- 
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]