> On 25 Feb 2018, at 20:50, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote: > > On Sat, Feb 24, 2018 at 11:28 AM, <lars.schneider@xxxxxxxxxxxx> wrote: >> UTF supports lossless conversion round tripping and conversions between >> UTF and other encodings are mostly round trip safe as Unicode aims to be >> a superset of all other character encodings. However, certain encodings >> (e.g. SHIFT-JIS) are known to have round trip issues [1]. >> >> Add 'core.checkRoundtripEncoding', which contains a comma separated >> list of encodings, to define for what encodings Git should check the >> conversion round trip if they are used in the 'working-tree-encoding' >> attribute. >> >> Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'. >> >> [1] https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode >> >> Signed-off-by: Lars Schneider <larsxschneider@xxxxxxxxx> >> --- >> diff --git a/convert.c b/convert.c >> @@ -289,6 +289,39 @@ static void trace_encoding(const char *context, const char *path, >> +static int check_roundtrip(const char* enc_name) >> +{ >> + /* >> + * check_roundtrip_encoding contains a string of space and/or >> + * comma separated encodings (eg. "UTF-16, ASCII, CP1125"). >> + * Search for the given encoding in that string. >> + */ >> + const char *found = strcasestr(check_roundtrip_encoding, enc_name); >> + const char *next = found + strlen(enc_name); > > Is this pointer arithmetic undefined behavior (according to the C > standard) if NULL is returned by strcasestr()? If so, how comfortable > are we with this? Perhaps if you add an 'if' into the mix, you can > avoid it: > > if (found) { > const char *next = found + strlen(enc_name); > return ...long complicated expression...; > } > return false; OK. I've fixed it this way: if (!found) return 0; [...] >> + >> + if (!re_src || src_len != re_src_len || >> + memcmp(src, re_src, src_len)) { >> + const char* msg = _("encoding '%s' from %s to %s and " >> + "back is not the same"); >> + die(msg, path, enc->name, default_encoding); > > Last two arguments need to be swapped. Hm. Are you sure? I think it is correct as it is. We are in encode_to_git() here and that means we encode *to* "default encoding", no? >> + } >> + >> + free(re_src); >> + } >> + >> strbuf_attach(buf, dst, dst_len, dst_len + 1); >> return 1; >> } >> diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh >> @@ -225,4 +225,45 @@ test_expect_success 'error if encoding garbage is already in Git' ' >> +test_expect_success 'check roundtrip encoding' ' >> + text="hallo there!\nroundtrip test here!" && >> + printf "$text" | iconv -f UTF-8 -t SHIFT-JIS >roundtrip.shift && >> + printf "$text" | iconv -f UTF-8 -t UTF-16 >roundtrip.utf16 && >> + echo "*.shift text working-tree-encoding=SHIFT-JIS" >>.gitattributes && >> + >> + # SHIFT-JIS encoded files are round-trip checked by default... >> + GIT_TRACE=1 git add .gitattributes roundtrip.shift 2>&1 >/dev/null | >> + grep "Checking roundtrip encoding for SHIFT-JIS" && > > Why redirect to /dev/null? If something does go wrong somewhere, the > more output available for debugging the problem, the better, so > throwing it away unnecessarily seems contraindicated. OK! >> + git reset && >> + >> + # ... unless we overwrite the Git config! >> + test_config core.checkRoundtripEncoding "garbage" && >> + ! GIT_TRACE=1 git add .gitattributes roundtrip.shift 2>&1 >/dev/null | >> + grep "Checking roundtrip encoding for SHIFT-JIS" && >> + test_unconfig core.checkRoundtripEncoding && > > The "unconfig" won't take place if the test fails. Instead of > test_config/test_unconfig, you could use '-c' to set the config > transiently for the git-add operation: > > ! GIT_TRACE=1 git -c core.checkRoundtripEncoding=garbage add ... Agreed. Although test_config (in t/test-lib-functions.sh) automatically unsets itself after the test is over. >> + git reset && >> + >> + # UTF-16 encoded files should not be round-trip checked by default... >> + ! GIT_TRACE=1 git add roundtrip.utf16 2>&1 >/dev/null | >> + grep "Checking roundtrip encoding for UTF-16" && >> + git reset && >> + >> + # ... unless we tell Git to check it! >> + test_config_global core.checkRoundtripEncoding "UTF-16, UTF-32" && >> + GIT_TRACE=1 git add roundtrip.utf16 2>&1 >/dev/null | >> + grep "Checking roundtrip encoding for UTF-16" && >> + git reset && >> + >> + # ... unless we tell Git to check it! >> + # (here we also check that the casing of the encoding is irrelevant) >> + test_config_global core.checkRoundtripEncoding "UTF-32, utf-16" && >> + GIT_TRACE=1 git add roundtrip.utf16 2>&1 >/dev/null | >> + grep "Checking roundtrip encoding for UTF-16" && >> + git reset && >> + >> + # cleanup >> + rm roundtrip.shift roundtrip.utf16 && >> + git reset --hard HEAD >> +' > > Same comment as for patch 5/7: This cleanup won't happen if the test > fails. Instead, use test_when_finished() earlier in the test: > > test_expect_success 'check roundtrip encoding' ' > test_when_finished "rm -f roundtrip.shift roundtrip.utf16; git > reset --hard HEAD" && > ... Agreed! Thanks you, Lars