Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 08 Feb 2019 09:50:07 -0800

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

>> So would you suggest that we just skip this test on Alpine Linux?
>
> That's not exactly what I said. If Alpine Linux users are never going to
> use this functionality and don't care that it's broken, then that's a
> fine solution.
>
> As originally mentioned, musl could change its libiconv to write a BOM,
> which would make it compatible with other known iconv implementations.
>
> There's also the possibility of defining NO_ICONV. That basically means
> that your system won't support encodings, and then this test shouldn't
> matter.
>
> Finally, you could try applying a patch to the test to make it write the
> BOM for UTF-16 since your iconv doesn't. I expect that the test will
> fail again later on once you've done that, though.

Sorry for being late to the party, but is the crux of the issue this
piece early in the test?

    printf "$text" | iconv -f UTF-8 -t UTF-16 >test.utf16.raw &&
    ...
    cp test.utf16.raw test.utf16 &&
    ...
    git add .gitattributes test.utf16 test.utf16lebom &&

where we expect "iconv -t UTF-16" means "write UTF16 in whatever
byteorder of your choice, but do write BOM", and iconv
implementations we have seen so far are in line with that
expectation, but the one on Apline writes UTF16 in big endian
without BOM?

If that is the case, I think it is our expectation that is at fault
in this case, as I think the most natural interpretation of "UTF-16"
without any modifiers (like "BE") ought to be "UTF16 stream
expressed in any way of writers choice, as long as it is readable by
standard compliant readers", in other words, "write UTF16 in
whatever byteorder of your choice, with or without BOM, but if you
omit BOM, you SHOULD write in big endian".  So

 - If our later test assumes that test.utf16 is UTF16 with BOM, that
   already assumes too much;

 - If our later test assumes that test.utf16 is UTF16 in big endian,
   that assumes too much, too.

As suggested earlier in the thread, the easiest workaround would be
to update the preparation of test.utf16.raw may to force big endian
with BOM by preprending BE-BOM by hand before "iconv -t UTF-32BE"
output (I am assuming that UTF-32BE will stay to be "big endian
without BOM" in the future).  That would make sure that the
assumption later tests have on test.utf16 is held true.