On Mon, Sep 23, 2019 at 03:04:19AM -0700, Alexandr Miloslavskiy via GitGitGadget wrote: > From: Alexandr Miloslavskiy <alexandr.miloslavskiy@xxxxxxxxxxx> Thanks for the tests, some nit-picks inline. > > After I discovered that UTF-16-LE-BOM test was bugged and still > succeeded... My interpretation is that the \000\000 must be handled correctly on all platforms, and that seems to be the case. Would this make more sense: After I discovered that UTF-16-LE-BOM test was bugged, I decided that better tests are required > ... I decided that better tests are required. Possibly the best > option here is to compare git results against hardcoded ground truth. > > The new tests also cover more interesting chars where (ANSI != UTF-8). > > Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@xxxxxxxxxxx> > --- > t/t0028-working-tree-encoding.sh | 39 ++++++++++++++++++++++++++++++++ > 1 file changed, 39 insertions(+) > > diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh > index 5493cf3ca9..d0dd5dd0ea 100755 > --- a/t/t0028-working-tree-encoding.sh > +++ b/t/t0028-working-tree-encoding.sh > @@ -280,4 +280,43 @@ test_expect_success ICONV_SHIFT_JIS 'check roundtrip encoding' ' > git reset > ' > > +# $1: checkout encoding > +# $2: test string > +# $3: binary test string in checkout encoding > +test_commit_utf8_checkout_other () { > + encoding="$1" > + orig_string="$2" > + expect_bytes="$3" > + > + test_expect_success "Commit utf-8, checkout ${encoding}" ' General remark: Do we need the {} here? ${encoding} could be simpler written as $encoding > + test_when_finished "git checkout HEAD -- .gitattributes" && > + > + test_ext="commit_utf8_checkout_${encoding}" && > + test_file="test.${test_ext}" && > + > + # Commit as utf-8 Another nit-pick: Looking at the other test cases, should utf-8 be written as UTF-8 for consistency ? > + echo "*.${test_ext} text working-tree-encoding=utf-8" >.gitattributes && > + printf "${orig_string}" >"${test_file}" && > + git add "${test_file}" && > + git commit -m "Test data" && > + > + # Checkout in tested encoding > + rm "${test_file}" && > + echo "*.${test_ext} text working-tree-encoding=${encoding}" >.gitattributes && > + git checkout HEAD -- "${test_file}" && > + > + # Test > + printf "${expect_bytes}" > "${test_file}.raw" && > + test_cmp_bin "${test_file}.raw" "${test_file}" More a style-nit: could we simply write like this: printf $expect_bytes > $test_file.raw && test_cmp_bin $test_file.raw $test_file (Even on other places) > + ' > +} > + > +test_commit_utf8_checkout_other "UTF-8" "Test Тест" "\124\145\163\164\040\320\242\320\265\321\201\321\202" > +test_commit_utf8_checkout_other "UTF-16LE" "Test Тест" "\124\000\145\000\163\000\164\000\040\000\042\004\065\004\101\004\102\004" > +test_commit_utf8_checkout_other "UTF-16BE" "Test Тест" "\000\124\000\145\000\163\000\164\000\040\004\042\004\065\004\101\004\102" > +test_commit_utf8_checkout_other "UTF-16LE-BOM" "Test Тест" "\377\376\124\000\145\000\163\000\164\000\040\000\042\004\065\004\101\004\102\004" > +test_commit_utf8_checkout_other "UTF-16BE-BOM" "Test Тест" "\376\377\000\124\000\145\000\163\000\164\000\040\004\042\004\065\004\101\004\102" > +test_commit_utf8_checkout_other "UTF-32LE" "Test Тест" "\124\000\000\000\145\000\000\000\163\000\000\000\164\000\000\000\040\000\000\000\042\004\000\000\065\004\000\000\101\004\000\000\102\004\000\000" > +test_commit_utf8_checkout_other "UTF-32BE" "Test Тест" "\000\000\000\124\000\000\000\145\000\000\000\163\000\000\000\164\000\000\000\040\000\000\004\042\000\000\004\065\000\000\004\101\000\000\004\102" > + > test_done > -- > gitgitgadget >