Re: [PATCH v3 5/5] fast-export: do automatic reencoding of commit messages only if requested

Elijah Newren <newren@xxxxxxxxx> · Mon, 13 May 2019 06:29:28 -0700

On Mon, May 13, 2019 at 5:56 AM Torsten Bögershausen <tboegi@xxxxxx> wrote:
>
> On Mon, May 13, 2019 at 12:23:29PM +0200, Johannes Schindelin wrote:
> > Hi Elijah,
> >
> > On Sat, 11 May 2019, Elijah Newren wrote:
> >
> > > [...] the craziness is based on how Windows behaves; it seems insane to
> > > me that Windows decides to munge user data (in the form of the command
> > > line provided), so much so that it makes me wonder if I really
> > > understood Hannes' and Dscho's explanations of what it is doing.
> >
> > It is not the user data that is munged by *Windows*, but by *Git for
> > Windows*. The user data on Windows is encoded in UTF-16 (or some slight
> > variant thereof). Git *cannot* handle UTF-16. Git's test suite *cannot*
> > handle UTF-16. So we convert. That's all there is to it.
> >
> > Ciao,
> > Dscho
> >
> > P.S.: Of course it is not *all* there is to it. There is also a current
> > code page which depends on the current user's current locale. We can
> > definitely not rely on that, as Git has no idea about this and would quite
> > positively produce incorrect output because of it. So we really just use
> > the `*W()` functions of the Win32 API (i.e. the ones accepting wide
> > Unicode characters and strings, i.e. UTF-16). I don't think we can do
> > better than that.
>
> We can actuall feed valid UTF-8 into a test case.
> (Remember that shell scripts need this octal numbering, see
> t/t0050)

Sure, but that's not useful here.  I need to feed both valid and
invalid ISO-8859-7 (or anything *other* than UTF-8) into a test case,
in order to verify how git handles reencoding from something other
than utf-8.  I did something like what you proposed originally, but
since it wasn't utf-8 it caused test failures on Windows.

Elijah