On Fri, Feb 08, 2019 at 09:50:07AM -0800, Junio C Hamano wrote: > "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: > > >> So would you suggest that we just skip this test on Alpine Linux? > > > > That's not exactly what I said. If Alpine Linux users are never going to > > use this functionality and don't care that it's broken, then that's a > > fine solution. > > > > As originally mentioned, musl could change its libiconv to write a BOM, > > which would make it compatible with other known iconv implementations. > > > > There's also the possibility of defining NO_ICONV. That basically means > > that your system won't support encodings, and then this test shouldn't > > matter. > > > > Finally, you could try applying a patch to the test to make it write the > > BOM for UTF-16 since your iconv doesn't. I expect that the test will > > fail again later on once you've done that, though. > > Sorry for being late to the party, but is the crux of the issue this > piece early in the test? > > printf "$text" | iconv -f UTF-8 -t UTF-16 >test.utf16.raw && > ... > cp test.utf16.raw test.utf16 && > ... > git add .gitattributes test.utf16 test.utf16lebom && > > where we expect "iconv -t UTF-16" means "write UTF16 in whatever > byteorder of your choice, but do write BOM", and iconv > implementations we have seen so far are in line with that > expectation, but the one on Apline writes UTF16 in big endian > without BOM? Firstly, the tests expect iconv -t UTF-16 to output a BOM, which it indeed does not do on Alpine. Secondly, git itself also expects the BOM to be present when the encoding is set to UTF-16, otherwise it will complain. > > If that is the case, I think it is our expectation that is at fault > in this case, as I think the most natural interpretation of "UTF-16" > without any modifiers (like "BE") ought to be "UTF16 stream > expressed in any way of writers choice, as long as it is readable by > standard compliant readers", in other words, "write UTF16 in > whatever byteorder of your choice, with or without BOM, but if you > omit BOM, you SHOULD write in big endian". So > > - If our later test assumes that test.utf16 is UTF16 with BOM, that > already assumes too much; > > - If our later test assumes that test.utf16 is UTF16 in big endian, > that assumes too much, too. > > As suggested earlier in the thread, the easiest workaround would be > to update the preparation of test.utf16.raw may to force big endian > with BOM by preprending BE-BOM by hand before "iconv -t UTF-32BE" > output (I am assuming that UTF-32BE will stay to be "big endian > without BOM" in the future). That would make sure that the > assumption later tests have on test.utf16 is held true. I tried change the test to manually inject a BOM to the file (and setting iconv to UTF-16LE / UTF16-BE, which lets the first test go through, but test 3 then fails, because git itself output the file without BOM, presumably because it's passed through iconv. So I'm not sure if it's a matter of just fixing the tests.