On Mon, May 13, 2019 at 5:56 AM Torsten Bögershausen <tboegi@xxxxxx> wrote: > > On Mon, May 13, 2019 at 12:23:29PM +0200, Johannes Schindelin wrote: > > Hi Elijah, > > > > On Sat, 11 May 2019, Elijah Newren wrote: > > > > > [...] the craziness is based on how Windows behaves; it seems insane to > > > me that Windows decides to munge user data (in the form of the command > > > line provided), so much so that it makes me wonder if I really > > > understood Hannes' and Dscho's explanations of what it is doing. > > > > It is not the user data that is munged by *Windows*, but by *Git for > > Windows*. The user data on Windows is encoded in UTF-16 (or some slight > > variant thereof). Git *cannot* handle UTF-16. Git's test suite *cannot* > > handle UTF-16. So we convert. That's all there is to it. > > > > Ciao, > > Dscho > > > > P.S.: Of course it is not *all* there is to it. There is also a current > > code page which depends on the current user's current locale. We can > > definitely not rely on that, as Git has no idea about this and would quite > > positively produce incorrect output because of it. So we really just use > > the `*W()` functions of the Win32 API (i.e. the ones accepting wide > > Unicode characters and strings, i.e. UTF-16). I don't think we can do > > better than that. > > We can actuall feed valid UTF-8 into a test case. > (Remember that shell scripts need this octal numbering, see > t/t0050) Sure, but that's not useful here. I need to feed both valid and invalid ISO-8859-7 (or anything *other* than UTF-8) into a test case, in order to verify how git handles reencoding from something other than utf-8. I did something like what you proposed originally, but since it wasn't utf-8 it caused test failures on Windows. Elijah