On Tue, Jan 12 2021, Johannes Schindelin via GitGitGadget wrote: > This patch series is incomplete because I lack the time to finish it, and I > hope that there are interested developers who can help with it. > > In https://lore.kernel.org/git/20181022153633.31757-1-pclouds@xxxxxxxxx/, > Duy proposed introducing a fake language they called "Ook" to be able to > test translations better. "Ook" would "translate" messages without any > printf format specifiers to "Eek", otherwise convert all upper-case letters > to "Ook" and lower-case letters to "ook". > > Gábor replied with a different proposal that has the advantage of being > bijective, i.e. it is possible to reconstruct the untranslated message from > the translated one. > > Ævar suggested recently > [https://lore.kernel.org/git/xmqqim836v6m.fsf@xxxxxxxxxxxxxxxxxxxxxx/T/#m6fdc43d4f1eb3f20f841096c59e985b69c84875e] > that this whole GETTEXT_POISON business is totally useless. To clarify what I really mean, but admittedly am not always the best at articulating: I do not think GETTEXT_POISON is useless, it's useful. The interesting question is "useful enough to bother?", or in economics terms "is this investment in time/money worth the opportunity cost?". Any hoops we make people jump through when writing tests takes away time and attention from other things. My motivation here is that I feel this whole poison business is all my fault. Every time some newbie submits a patch and needs to re-roll a v2 because they used "grep" instead of "test_i18ngrep" in a case that clearly didn't involve a plumbing string I cringe a little about the technical debt. So while I still think my series (just nuke it) would be better overall, I'm secretly rooting for yours. If the consensus is that this is worth keeping and improving perhaps we haven't been mostly wasting our time :) On to discussing this series: > I do not believe that it is useless. To back up my belief with something > tangible, I implemented a GETTEXT_POISON=rot13 mode and ran the test suite > to see whether we erroneously expect translated messages where they aren't, > and related problems. I agree that rot13ing it makes it much more useful in the sense that before we could over-tag things with test_i18n* and we'd just "return 0" there in the poison mode, now we actually check if the string is poisoned. We could get most of the way there by checking e.g. if "GETTEXT POISON" is in the output, but it wouldn't assert that the message is 100% translated as this WIP series does. In our off-list discussion you maintained that "GIT_TEST_GETTEXT_POISON=false <cmd>" was an anti-pattern. Was it because you thought "test_i18n*" et al was actually doing some "is poison?" comparison as we now do, or just because you didn't conceptually like a pattern like: GIT_TEST_GETTEXT_POISON=false git cmd >out && grep string out Being changed later to: GIT_TEST_GETTEXT_POISON=false git cmd >out && grep string out && grep plumbing-string out Or whatever, as opposed to: git cmd >out && test_i18ngrep string out && test_i18ngrep plumbing-string out I get the aesthetic preference, I'm just wondering if I'm missing something else about it... > And the experiment delivered. It is just a demonstration (I only addressed a > handful of the test suite failures, 124 test scripts still need to be > inspected to find out why they fail), of course. Yet I think that finding > e.g. the missing translations of sha1dc-cb and parse-options show that it > would be very useful to complete this patch series and then use the rot13 > mode in our CI jobs (instead of the current GETTEXT_POISON=true mode, which > really is less useful). The following patch on top cuts down on the failures by more than half: ----------------- diff --git a/git-sh-i18n.sh b/git-sh-i18n.sh index 8eef60b43f..f9239d2917 100644 --- a/git-sh-i18n.sh +++ b/git-sh-i18n.sh @@ -17,7 +17,10 @@ export TEXTDOMAINDIR # First decide what scheme to use... GIT_INTERNAL_GETTEXT_SH_SCHEME=fallthrough -if test -n "$GIT_TEST_GETTEXT_POISON" && +if test -n "$GIT_TEST_GETTEXT_POISON" -a "$GIT_TEST_GETTEXT_POISON" = "rot13" +then + GIT_INTERNAL_GETTEXT_SH_SCHEME=rot13poison +elif test -n "$GIT_TEST_GETTEXT_POISON" && git env--helper --type=bool --default=0 --exit-code \ GIT_TEST_GETTEXT_POISON then @@ -63,6 +66,21 @@ gettext_without_eval_gettext) ) } ;; +rot13poison) + # Emit garbage so that tests that incorrectly rely on translatable + # strings will fail. + gettext () { + printf "%s" "# SHELL GETTEXT POISON #" + } + + eval_gettext () { + printf "%s" "# SHELL GETTEXT POISON #" + } + + eval_ngettext () { + printf "%s" "# SHELL GETTEXT POISON #" + } + ;; poison) # Emit garbage so that tests that incorrectly rely on translatable # strings will fail. diff --git a/t/helper/test-i18n.c b/t/helper/test-i18n.c index 82efc66f1f..1f0fa3d041 100644 --- a/t/helper/test-i18n.c +++ b/t/helper/test-i18n.c @@ -1,6 +1,8 @@ #include "test-tool.h" #include "cache.h" #include "grep.h" +#include "config.h" static const char *usage_msg = "\n" " test-tool i18n cmp <file1> <file2>\n" @@ -34,8 +36,12 @@ static size_t unrot13(char *buf) p += strlen("<rot13>"); end = strstr(p, "</rot13>"); - if (!end) - BUG("could not find </rot13> in\n%s", buf); + if (!end) { + if (git_env_bool("GIT_TEST_GETTEXT_POISON_PEDANTIC", 0)) + BUG("could not find </rot13> in\n%s", buf); + else + break; + } while (p != end) *(q++) = do_rot13(*(p++)); @@ -67,8 +73,12 @@ static int i18n_cmp(const char **argv) if (strbuf_read_file(&a, path1, 0) < 0) die_errno("could not read %s", path1); + if (strstr(a.buf, "# SHELL GETTEXT POISON #")) + return 0; if (strbuf_read_file(&b, path2, 0) < 0) die_errno("could not read %s", path2); + if (strstr(b.buf, "# SHELL GETTEXT POISON #")) + return 0; unrot13_strbuf(&b); if (a.len != b.len || memcmp(a.buf, b.buf, a.len)) @@ -79,7 +89,6 @@ static int i18n_cmp(const char **argv) static int i18n_grep(const char **argv) { - int dont_match = 0; const char *pattern, *path; struct grep_opt opt; @@ -87,11 +96,6 @@ static int i18n_grep(const char **argv) struct strbuf buf = STRBUF_INIT; int hit; - if (*argv && !strcmp("!", *argv)) { - dont_match = 1; - argv++; - } - pattern = *(argv++); path = *(argv++); @@ -104,13 +108,15 @@ static int i18n_grep(const char **argv) if (strbuf_read_file(&buf, path, 0) < 0) die_errno("could not read %s", path); + if (strstr(buf.buf, "# SHELL GETTEXT POISON #")) + return 0; unrot13_strbuf(&buf); grep_source_init(&source, GREP_SOURCE_BUF, path, path, path); source.buf = buf.buf; source.size = buf.len; hit = grep_source(&opt, &source); strbuf_release(&buf); - return dont_match ^ !hit; + return !hit; } int cmd__i18n(int argc, const char **argv) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 394fd2ef5b..6c580a3000 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1019,14 +1019,12 @@ test_i18ngrep () { BUG "too few parameters to test_i18ngrep" fi - if test_have_prereq !C_LOCALE_OUTPUT + grep_cmd=grep + if test "$GIT_TEST_GETTEXT_POISON" == "rot13" + then + grep_cmd="test-tool i18n grep" + elif test_have_prereq !C_LOCALE_OUTPUT then - if test rot13 = "$GIT_TEST_GETTEXT_POISON" - then - test-tool i18n grep "$@" - return $? - fi - # pretend success return 0 fi @@ -1034,13 +1032,13 @@ test_i18ngrep () { if test "x!" = "x$1" then shift - ! grep "$@" && return 0 + ! $grep_cmd "$@" && return 0 - echo >&4 "error: '! grep $@' did find a match in:" + echo >&4 "error: '! $grep_cmd $@' did find a match in:" else - grep "$@" && return 0 + $grep_cmd "$@" && return 0 - echo >&4 "error: 'grep $@' didn't find a match in:" + echo >&4 "error: '$grep_cmd $@' didn't find a match in:" fi if test -s "$last_arg" ----------------- I.e. most of this was because you didn't add any support for this to the shellscript translations. We still punt on that here, it should ideally preserve the shell variables like the format specifiers, but I think that's not worth the effort with us actively cutting down our shell code to nothing. We still have failures e.g. due to "test_i18ngrep -F fixed file" not being supported. Wouldn't a better/simpler approach be to just have a light rot13 helper, and then call the actual "cmp"/"grep" with the munged input/output?