On 2021-06-02 20:50:39+0700, Đoàn Trần Công Danh <congdanhqx@xxxxxxxxx> wrote: > On 2021-06-02 15:36:57+0200, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > That's getting us there, now we don't fail on the 2nd test, but do start > > failing on the third "re-encode to UTF-16 on checkout" and other > > "checkout" tests. > > > > The "test_cmp" at the end of that 3rd tests shows that the difference in > > test.utf16.raw and test.utf16 is now that the "raw" one has the BOM, but > > not the "test.utf16" file. > > That meant we need: ICONV_OMITS_BOM=UnfortunatelyYes for AIX? > I can replicate that test failure when building for musl libc without > ICONV_OMITS_BOM undefined. Applying my patch and build with ICONV_OMITS_BOM=Yes, t0028.3 passed but t0028.4 and t0028.21 run into failure. Here is the dump of first 10 characters of test.utf16lebom: '0xff', '0xfe', '0xfe', '0xff', '0x0', '0x68', '0x0', '0x61', '0x0', '0x6c', Digging a bit more, it seems like iconv(3) from utf-16-le-bom to utf-8 there is broken, iconv(3) thinks it's converting from utf-16-be to utf-8: source (test.utf16lebom, considered UTF-16LE-BOM): | 0: ff | 1: fe | 2: 68 h | 3: 0 | 4: 61 a | 5: 0 | 6: 6c l | 7: 0 | 8: 6c l | 9: 0 | 10: 6f o | 11: 0 | 12: 20 | 13: 0 | 14: 74 t | 15: 0 | 16: 68 h | 17: 0 | 18: 65 e | 19: 0 | 20: 72 r | 21: 0 | 22: 65 e | 23: 0 | 24: 21 ! | 25: 0 | 26: a | 27: 0 | 28: 63 c | 29: 0 | 30: 61 a | 31: 0 | 32: 6e n | 33: 0 | 34: 20 | 35: 0 | 36: 79 y | 37: 0 | 38: 6f o | 39: 0 | 40: 75 u | 41: 0 | 42: 20 | 43: 0 | 44: 72 r | 45: 0 | 46: 65 e | 47: 0 | 48: 61 a | 49: 0 | 50: 64 d | 51: 0 | 52: 20 | 53: 0 | 54: 6d m | 55: 0 | 56: 65 e | 57: 0 | 58: 3f ? | 59: 0 destination (test.utf16lebom, considered UTF-8): | 0: ef | 1: bf | 2: be | 3: e6 | 4: a0 | 5: 80 | 6: e6 | 7: 84 | 8: 80 | 9: e6 | 10: b0 | 11: 80 | 12: e6 | 13: b0 | 14: 80 | 15: e6 | 16: bc | 17: 80 | 18: e2 | 19: 80 | 20: 80 | 21: e7 | 22: 90 | 23: 80 | 24: e6 | 25: a0 | 26: 80 | 27: e6 | 28: 94 | 29: 80 | 30: e7 | 31: 88 | 32: 80 | 33: e6 | 34: 94 | 35: 80 | 36: e2 | 37: 84 | 38: 80 | 39: e0 | 40: a8 | 41: 80 | 42: e6 | 43: 8c | 44: 80 | 45: e6 | 46: 84 | 47: 80 | 48: e6 | 49: b8 | 50: 80 | 51: e2 | 52: 80 | 53: 80 | 54: e7 | 55: a4 | 56: 80 | 57: e6 | 58: bc | 59: 80 | 60: e7 | 61: 94 | 62: 80 | 63: e2 | 64: 80 | 65: 80 | 66: e7 | 67: 88 | 68: 80 | 69: e6 | 70: 94 | 71: 80 | 72: e6 | 73: 84 | 74: 80 | 75: e6 | 76: 90 | 77: 80 | 78: e2 | 79: 80 | 80: 80 | 81: e6 | 82: b4 | 83: 80 | 84: e6 | 85: 94 | 86: 80 | 87: e3 | 88: bc | 89: 80 -- Danh