Re: [PATCH 00/11] [RFH] Introduce support for GETTEXT_POISON=rot13

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 15 Jan 2021 17:13:25 +0100 (CET)

Hi Ævar,

On Tue, 12 Jan 2021, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Jan 12 2021, Johannes Schindelin via GitGitGadget wrote:
>
> > I implemented a GETTEXT_POISON=rot13 mode and ran the test suite to
> > see whether we erroneously expect translated messages where they
> > aren't, and related problems.
>
> I agree that rot13ing it makes it much more useful in the sense that
> before we could over-tag things with test_i18n* and we'd just "return 0"
> there in the poison mode, now we actually check if the string is
> poisoned.

Precisely.

> In our off-list discussion you maintained that
> "GIT_TEST_GETTEXT_POISON=false <cmd>" was an anti-pattern. Was it
> because you thought "test_i18n*" et al was actually doing some "is
> poison?" [...]

I maintain this view because the whole point of having a GETTEXT_POISON
job is to have the entire test suite run in that mode. Using this
anti-pattern says to me "the author could not make this work correctly in
GETTEXT_POISON mode".

Even more, _iff_ we expect certain validations to be skipped in the
GETTEXT_POISON job, using `GIT_TEST_GETTEXT_POISON=false` opens the
possibility that this job fails when it never should have failed.

> > And the experiment delivered. It is just a demonstration (I only addressed a
> > handful of the test suite failures, 124 test scripts still need to be
> > inspected to find out why they fail), of course. Yet I think that finding
> > e.g. the missing translations of sha1dc-cb and parse-options show that it
> > would be very useful to complete this patch series and then use the rot13
> > mode in our CI jobs (instead of the current GETTEXT_POISON=true mode, which
> > really is less useful).
>
> The following patch on top cuts down on the failures by more than half:

Interesting! I forgot the shell half... Curious that the scripted parts
_still_ affect so much of our test suite :-(

> -----------------
> diff --git a/git-sh-i18n.sh b/git-sh-i18n.sh
> index 8eef60b43f..f9239d2917 100644
> --- a/git-sh-i18n.sh
> +++ b/git-sh-i18n.sh
> @@ -17,7 +17,10 @@ export TEXTDOMAINDIR
>
>  # First decide what scheme to use...
>  GIT_INTERNAL_GETTEXT_SH_SCHEME=fallthrough
> -if test -n "$GIT_TEST_GETTEXT_POISON" &&
> +if test -n "$GIT_TEST_GETTEXT_POISON" -a "$GIT_TEST_GETTEXT_POISON" = "rot13"
> +then
> +	GIT_INTERNAL_GETTEXT_SH_SCHEME=rot13poison
> +elif  test -n "$GIT_TEST_GETTEXT_POISON" &&
>  	    git env--helper --type=bool --default=0 --exit-code \
>  		GIT_TEST_GETTEXT_POISON
>  then
> @@ -63,6 +66,21 @@ gettext_without_eval_gettext)
>  		)
>  	}
>  	;;
> +rot13poison)
> +	# Emit garbage so that tests that incorrectly rely on translatable
> +	# strings will fail.
> +	gettext () {
> +		printf "%s" "# SHELL GETTEXT POISON #"
> +	}
> +
> +	eval_gettext () {
> +		printf "%s" "# SHELL GETTEXT POISON #"
> +	}
> +
> +	eval_ngettext () {
> +		printf "%s" "# SHELL GETTEXT POISON #"
> +	}

Right, and for that, I'd rather use the `test-tool i18n` helper (it is
appropriate to expect `test-tool` to be in the `PATH` in GETTEXT_POISON
mode, right?)

> +	;;
>  poison)
>  	# Emit garbage so that tests that incorrectly rely on translatable
>  	# strings will fail.
> diff --git a/t/helper/test-i18n.c b/t/helper/test-i18n.c
> index 82efc66f1f..1f0fa3d041 100644
> --- a/t/helper/test-i18n.c
> +++ b/t/helper/test-i18n.c
> @@ -1,6 +1,8 @@
>  #include "test-tool.h"
>  #include "cache.h"
>  #include "grep.h"
> +#include "config.h"
>
>  static const char *usage_msg = "\n"
>  "  test-tool i18n cmp <file1> <file2>\n"
> @@ -34,8 +36,12 @@ static size_t unrot13(char *buf)
>
>  		p += strlen("<rot13>");
>  		end = strstr(p, "</rot13>");
> -		if (!end)
> -			BUG("could not find </rot13> in\n%s", buf);
> +		if (!end) {
> +			if (git_env_bool("GIT_TEST_GETTEXT_POISON_PEDANTIC", 0))
> +				BUG("could not find </rot13> in\n%s", buf);
> +			else
> +				break;
> +		}
>
>  		while (p != end)
>  			*(q++) = do_rot13(*(p++));
> @@ -67,8 +73,12 @@ static int i18n_cmp(const char **argv)
>
>  	if (strbuf_read_file(&a, path1, 0) < 0)
>  		die_errno("could not read %s", path1);
> +	if (strstr(a.buf, "# SHELL GETTEXT POISON #"))
> +		return 0;
>  	if (strbuf_read_file(&b, path2, 0) < 0)
>  		die_errno("could not read %s", path2);
> +	if (strstr(b.buf, "# SHELL GETTEXT POISON #"))
> +		return 0;
>  	unrot13_strbuf(&b);
>
>  	if (a.len != b.len || memcmp(a.buf, b.buf, a.len))
> @@ -79,7 +89,6 @@ static int i18n_cmp(const char **argv)
>
>  static int i18n_grep(const char **argv)
>  {
> -	int dont_match = 0;
>  	const char *pattern, *path;
>
>  	struct grep_opt opt;
> @@ -87,11 +96,6 @@ static int i18n_grep(const char **argv)
>  	struct strbuf buf = STRBUF_INIT;
>  	int hit;
>
> -	if (*argv && !strcmp("!", *argv)) {
> -		dont_match = 1;
> -		argv++;
> -	}
> -
>  	pattern = *(argv++);
>  	path = *(argv++);
>
> @@ -104,13 +108,15 @@ static int i18n_grep(const char **argv)
>
>  	if (strbuf_read_file(&buf, path, 0) < 0)
>  		die_errno("could not read %s", path);
> +	if (strstr(buf.buf, "# SHELL GETTEXT POISON #"))
> +		return 0;
>  	unrot13_strbuf(&buf);
>  	grep_source_init(&source, GREP_SOURCE_BUF, path, path, path);
>  	source.buf = buf.buf;
>  	source.size = buf.len;
>  	hit = grep_source(&opt, &source);
>  	strbuf_release(&buf);
> -	return dont_match ^ !hit;
> +	return !hit;
>  }
>
>  int cmd__i18n(int argc, const char **argv)
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 394fd2ef5b..6c580a3000 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1019,14 +1019,12 @@ test_i18ngrep () {
>  		BUG "too few parameters to test_i18ngrep"
>  	fi
>
> -	if test_have_prereq !C_LOCALE_OUTPUT
> +	grep_cmd=grep
> +	if test "$GIT_TEST_GETTEXT_POISON" == "rot13"
> +	then
> +		grep_cmd="test-tool i18n grep"
> +	elif test_have_prereq !C_LOCALE_OUTPUT
>  	then
> -		if test rot13 = "$GIT_TEST_GETTEXT_POISON"
> -		then
> -			test-tool i18n grep "$@"
> -			return $?
> -		fi
> -
>  		# pretend success
>  		return 0
>  	fi
> @@ -1034,13 +1032,13 @@ test_i18ngrep () {
>  	if test "x!" = "x$1"
>  	then
>  		shift
> -		! grep "$@" && return 0
> +		! $grep_cmd "$@" && return 0
>
> -		echo >&4 "error: '! grep $@' did find a match in:"
> +		echo >&4 "error: '! $grep_cmd $@' did find a match in:"
>  	else
> -		grep "$@" && return 0
> +		$grep_cmd "$@" && return 0
>
> -		echo >&4 "error: 'grep $@' didn't find a match in:"
> +		echo >&4 "error: '$grep_cmd $@' didn't find a match in:"
>  	fi
>
>  	if test -s "$last_arg"
> -----------------
>
> I.e. most of this was because you didn't add any support for this to the
> shellscript translations.
>
> We still punt on that here, it should ideally preserve the shell
> variables like the format specifiers, but I think that's not worth the
> effort with us actively cutting down our shell code to nothing.

I'm not sure how fast we'll get there, what with `git submodule` being
slow to get over the finish line, and with `mergetool`/`difftool` still
being scripted, and probably for a long time, too.

> We still have failures e.g. due to "test_i18ngrep -F fixed file" not
> being supported. Wouldn't a better/simpler approach be to just have a
> light rot13 helper, and then call the actual "cmp"/"grep" with the
> munged input/output?

Hmm. I hoped that this work would open the door to introducing more C
helpers in the test suite, thereby accelerating it on Windows.

We could also potentially move the `cmp`/`grep` parts into separate
helpers that give more useful output in case of a failure. (Right now, if
anything fails in CI involving `grep` or `cmp`, the user investigating
this pretty much _has_ to try to reproduce the issue locally because the
output is so not helpful, and reproducing it might be a challenge e.g.
when the failure is macOS-specific and the developer does not have access
to a macOS setup).

Ciao,
Dscho