On Mon, May 13, 2019 at 12:23:29PM +0200, Johannes Schindelin wrote: > Hi Elijah, > > On Sat, 11 May 2019, Elijah Newren wrote: > > > [...] the craziness is based on how Windows behaves; it seems insane to > > me that Windows decides to munge user data (in the form of the command > > line provided), so much so that it makes me wonder if I really > > understood Hannes' and Dscho's explanations of what it is doing. > > It is not the user data that is munged by *Windows*, but by *Git for > Windows*. The user data on Windows is encoded in UTF-16 (or some slight > variant thereof). Git *cannot* handle UTF-16. Git's test suite *cannot* > handle UTF-16. So we convert. That's all there is to it. > > Ciao, > Dscho > > P.S.: Of course it is not *all* there is to it. There is also a current > code page which depends on the current user's current locale. We can > definitely not rely on that, as Git has no idea about this and would quite > positively produce incorrect output because of it. So we really just use > the `*W()` functions of the Win32 API (i.e. the ones accepting wide > Unicode characters and strings, i.e. UTF-16). I don't think we can do > better than that. We can actuall feed valid UTF-8 into a test case. (Remember that shell scripts need this octal numbering, see t/t0050) See the "ä" code point: $ auml=$(printf '\303\244') $ printf $auml ä Now we can feed those 2 bytes (wich are valid UTF) into Git and say "convert them from ISO-8859-1 into UTF-8, resulting in 4 bytes. Is my explanation clear enough ? If not, plese tell me.