Lars Noschinski:
Using no encoding for filenames was the obvious (and I would argue) correct choice. Unix filenames are specified to be a sequence of bytes, excluding '/' and '\0'.
I know the Unix way of thinking lends itself to such a design. This is one of the few cases where I personally think Unix has got it wrong, and Windows (NT) has got it right. But then again, Unix' design pre-dates the locale issue by quite some time, so it is not difficult to see where it comes from.
Changing the filename (on checkout), so that the user sees an Ü regardless of his or her locale (instead of an \0xDC, which only resolves to an Ü on latin-1) would be an absolutely broken concept here.
Why would it? It is my view as a user on my files that define how file names are looked upon. If I have three machines, one Linux box using a iso8859-1 locale, an OS X box (where, I would believe, file APIs use UTF-8, someone please correct me if I'm wrong), and a Windows box (which uses UTF-16 on the file system layer, but does provide compatibility functions that use char pointers), and create a file on each of these called "Ü.txt" (which would be the sequence "DC 2E 74 78 74" on the Linux box, "C3 9C 2E 74 78 74" (or probably something else since I believe OS X decomposes the string) on the OS X box and "00DC 002E 0074 0078 0074" on the Windows box, I see these three file names as equal.
If I would create a Git repo on each of the three machines and put the file name in it, and then clone that on one of the other machines. *I* would assume that the file names were converted to fit the host operating system.
IMHO having encoding specific open functions is begging for problems.
Indeed. That's why I like Windows' wchar_t APIs, and dislike Unix' and Linux' char APIs that, in some ways, depend on the user locale.
-- \\// Peter - http://www.softwolves.pp.se/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html