Re: Git, Mac OS X and German special characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 2, 2011 at 12:02 AM, Michael Witten <mfwitten@xxxxxxxxx> wrote:
> On Sat, Oct 1, 2011 at 19:47, Andreas Krey <a.krey@xxxxxx> wrote:
>
>> The question is, should git forbid two filenames that consist
>> of the *same* characters, only differently uni-encoded? I don't
>> think anyone would make two files named 'Büro', with different
>> unicode encodings. But as far as I know that is a shady area.
>
> So, let's leave git's current behavior as the default and provide
> a config variable that when set, tells git to handle file names
> in terms of characters rather than bytes.

I just read the very lengthy discussion here:
http://thread.gmane.org/gmane.comp.version-control.git/70688

Basically all the arguments have already been discussed.

There are varios options. Most of them are not mutual exclusive, so it
would also be an option to implement most of them and let the user
pick what (s)he prefers.

* TreatFilenamesAsText or however you would call it. I.e. handle
filenames the same when they equal in Unicode.

Linus is very much against this because in rare situations, it could
destroy your data, like in this example:

	echo "foo" > Hütte # "Hütte" in NFC
	echo "bar" > Hütte # "Hütte" in NFD

The second write would overwrite silently the file generated by the
first write if those filenames would be handled the same. This (and
such) behavior is to be avoided, claims Linus, because it would more
often lead to not wanted behavior in third party applications.

* On MacOSX, wrap all filesystem functions (like readdir()) to convert
all filenames to NFC.

MacOSX normalizes the UTF8 representation of the filenames to NFD but
in most common situations (on most other systems), you end up with the
filename being in NFC.

As the filename is anyway normalized on OSX, it doesn't matter wether
it is handled as NFC or NFD and NFC will likely generate less trouble.
And this patch doesn't even really need an option.

This was one suggestion by Linus itself:
http://news.gmane.org/find-root.php?message_id=%3calpine.LFD.1.00.0801211323120.2957%40woody.linux%2dfoundation.org%3e

* Disallow any files with filenames which are not in NFC at all. This
makes some things a bit more safe (like on MacOSX; along with the
previous suggestion) and more clear (you always know that your
filename is in NFC).

* Some more clever readdir() which, when it gets a filename which is
not in the Git index but Unicode-equally to one filename in the Git
index, automatically replaces it by the filename in the index.

This is some sort of half way to a TreatFilenamesAsText option but
should produce less trouble.

This probably also doesn't need an extra option as it should very
likely generate less trouble (on OSX at least; and for other systems
which don't mangle the filename, they don't need to use this code at
all).

---

I will probably go and try to implement the clever-readdir(). And/or
maybe also the NFC conversation in such a readdir() wrapper.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]