On Sun, Oct 2, 2011 at 12:02 AM, Michael Witten <mfwitten@xxxxxxxxx> wrote: > On Sat, Oct 1, 2011 at 19:47, Andreas Krey <a.krey@xxxxxx> wrote: > >> The question is, should git forbid two filenames that consist >> of the *same* characters, only differently uni-encoded? I don't >> think anyone would make two files named 'Büro', with different >> unicode encodings. But as far as I know that is a shady area. > > So, let's leave git's current behavior as the default and provide > a config variable that when set, tells git to handle file names > in terms of characters rather than bytes. I just read the very lengthy discussion here: http://thread.gmane.org/gmane.comp.version-control.git/70688 Basically all the arguments have already been discussed. There are varios options. Most of them are not mutual exclusive, so it would also be an option to implement most of them and let the user pick what (s)he prefers. * TreatFilenamesAsText or however you would call it. I.e. handle filenames the same when they equal in Unicode. Linus is very much against this because in rare situations, it could destroy your data, like in this example: echo "foo" > Hütte # "Hütte" in NFC echo "bar" > Hütte # "Hütte" in NFD The second write would overwrite silently the file generated by the first write if those filenames would be handled the same. This (and such) behavior is to be avoided, claims Linus, because it would more often lead to not wanted behavior in third party applications. * On MacOSX, wrap all filesystem functions (like readdir()) to convert all filenames to NFC. MacOSX normalizes the UTF8 representation of the filenames to NFD but in most common situations (on most other systems), you end up with the filename being in NFC. As the filename is anyway normalized on OSX, it doesn't matter wether it is handled as NFC or NFD and NFC will likely generate less trouble. And this patch doesn't even really need an option. This was one suggestion by Linus itself: http://news.gmane.org/find-root.php?message_id=%3calpine.LFD.1.00.0801211323120.2957%40woody.linux%2dfoundation.org%3e * Disallow any files with filenames which are not in NFC at all. This makes some things a bit more safe (like on MacOSX; along with the previous suggestion) and more clear (you always know that your filename is in NFC). * Some more clever readdir() which, when it gets a filename which is not in the Git index but Unicode-equally to one filename in the Git index, automatically replaces it by the filename in the index. This is some sort of half way to a TreatFilenamesAsText option but should produce less trouble. This probably also doesn't need an extra option as it should very likely generate less trouble (on OSX at least; and for other systems which don't mangle the filename, they don't need to use this code at all). --- I will probably go and try to implement the clever-readdir(). And/or maybe also the NFC conversation in such a readdir() wrapper. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html