Timur Sufiev:
1. Many git front-ends, e.g. TortoiseGit, use 8-bit set, not UTF-16:
All of them do, that is because the output is using 8-bit. That is why the internal encoding need to remain eight-bit, for instance UTF-8.
they call git plumbing commands and pass filenames to command line (in local 8-bit encoding).
Well, yes. On Windows, however, there is the complication that the command line is available in two versions. There is a eight-bit and a UTF-16 version of it. Which one is constructed from which depends on how the application was launched. We can read the UTF-16 version and hope that it contains proper names (possibly looking at the eight-bit version as UTF-8 if necessary).
2. UTF-16 is a proper solution for Windows, but my patch is useful for other OSes with locales different from UTF-8 (e.g. Linux with KOI8-R locale).
Well, your patch re-implements the fopen() calls, converting the file name at that point (as well as readdir() and friends). I would do that on Windows as well, with the modification that on Windows, I would convert to UTF-16 and use _wfopen() instead. On systems that have it, you could also make it convert to UTF-32 and use their wfopen() (I'm not aware of many other OSes having those functions, though).
Still there is a possibility that one day we'll stumble upon some UTF-8 symbol which cannot not be correctly mapped into 8-bit encoding. UTF-16 would be a remedy in this case, but what if don't have it (see 2)?
That is of course an issue. There are several approaches to that: - Fail with an error. - Convert to a place-holder character. - QP encode the file name, perhaps? -- \\// Peter - http://www.softwolves.pp.se/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html