On Mon, Oct 22 2018, Ævar Arnfjörð Bjarmason wrote: > On Mon, Oct 22 2018, Nguyễn Thái Ngọc Duy wrote: > >> The current gettext() function just replaces all strings with >> '# GETTEXT POISON #' including format strings and hides the things >> that we should be allowed to grep (like branch names, or some other >> codes) even when gettext is poisoned. >> >> This patch implements the poisoned _() with a universal and totally >> legit language called Ook [1]. We could actually grep stuff even in >> with this because format strings are preserved. >> >> Long term, we could implement an "ook translator" for test_i18ngrep >> and friends so that they translate English to Ook, allowing us to >> match full text while making sure the text in the code is still marked >> for translation. > > Replying to both this patch, and SZEDER's series for brevity. Thanks > both for working on this. It's obvious that this stuff needs some > polishing, and it's great that's being done, and sorry for my part of > having left this in the current state. > > a) > > Both this patch and SZEDER's 7/8 are addressing the issue that the > poison mode doesn't preserve format specifiers. > > I haven't tried to dig this up in the list archive, and maybe it only > exists in my mind, but I seem to remember that this was explicitly > discussed as a goal at the time. > > I.e. we were expecting the lack of this to break tests, and that was > considered a feature as it would spot plumbing messages we shouldn't be > translating. > > However, it's my opinion that this was just a bad idea to begin with, > I've CC'd a couple of prolific markers of i18n from my log grepping (and > feel free to add more) to see if they disagre. > > I think it probably helped me a bit in the beginning when i18n was > bootstrapping and there was a *lot* to mark for translation, but we've > long since stabilized and aren't doing that anymore, so it's much easier > to just review the patches to see if they translate plumbing. > > All of which is to say that I think something like your patch here is > fine to just accept, and the only improvement I'd suggest is some note > in the commit message saying that this was always intentional, but > nobody can name a case where it helped, so let's just drop it. > > In SZEDER's case that we shouldn't have GIT_GETTEXT_POISON=scrambled, > let's just make it boolean and make "scrambled" (i.e. preserving format > specifiecs) the default. [...] > b) > > SZEDER's series, and your patch (although it's smaller) still preserve > the notion that there's such a thing as a GETTEXT_POISON *build* and you > actually need to compile git with that flag to make use of it. > > This, as I recall, was just paranoia on my part back in 2010/2011. I > wanted to be able to present a version of this i18n stuff where people > who didn't like it could be assured that if they didn't opt-in to it > they wouldn't get any of the code (sans a few trivial and obviously > correct wrapper functions). > > So I think the only reason to keep it compile-time is performance, but I > don't think that matters. It's not like we're printing gigabytes of _() > formatted output. Everything where formatting matters is plumbing which > doesn't use this API. These messages are always for human consumption. > > So shouldn't we just drop this notion of GETTEXT_POISON at compile-time > entirely, and make GIT_GETTEXT_POISON=<boolean> a test flag that all > builds of git are aware of? I think so. Something like this, untested except I compiled it and ran one test with/without GIT_GETTEXT_POISON, so it may have bugs: Makefile | 10 +--------- gettext.c | 4 +--- gettext.h | 4 ---- po/README | 13 ++++--------- t/README | 5 +++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/test-lib-functions.sh | 8 ++++---- t/test-lib.sh | 12 +++--------- 9 files changed, 20 insertions(+), 38 deletions(-) diff --git a/Makefile b/Makefile index d18ab0fe78..78190ee9d9 100644 --- a/Makefile +++ b/Makefile @@ -362,11 +362,6 @@ all:: # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the # user. # -# Define GETTEXT_POISON if you are debugging the choice of strings marked -# for translation. In a GETTEXT_POISON build, you can turn all strings marked -# for translation into gibberish by setting the GIT_GETTEXT_POISON variable -# (to any value) in your environment. -# # Define JSMIN to point to JavaScript minifier that functions as # a filter to have gitweb.js minified. # @@ -714,6 +709,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o TEST_BUILTINS_OBJS += test-dump-untracked-cache.o TEST_BUILTINS_OBJS += test-example-decorate.o TEST_BUILTINS_OBJS += test-genrandom.o +TEST_BUILTINS_OBJS += test-gettext-poison.o TEST_BUILTINS_OBJS += test-hashmap.o TEST_BUILTINS_OBJS += test-index-version.o TEST_BUILTINS_OBJS += test-json-writer.o @@ -1439,9 +1435,6 @@ endif ifdef NO_SYMLINK_HEAD BASIC_CFLAGS += -DNO_SYMLINK_HEAD endif -ifdef GETTEXT_POISON - BASIC_CFLAGS += -DGETTEXT_POISON -endif ifdef NO_GETTEXT BASIC_CFLAGS += -DNO_GETTEXT USE_GETTEXT_SCHEME ?= fallthrough @@ -2591,7 +2584,6 @@ ifdef GIT_TEST_CMP_USE_COPIED_CONTEXT @echo GIT_TEST_CMP_USE_COPIED_CONTEXT=YesPlease >>$@+ endif @echo NO_GETTEXT=\''$(subst ','\'',$(subst ','\'',$(NO_GETTEXT)))'\' >>$@+ - @echo GETTEXT_POISON=\''$(subst ','\'',$(subst ','\'',$(GETTEXT_POISON)))'\' >>$@+ ifdef GIT_PERF_REPEAT_COUNT @echo GIT_PERF_REPEAT_COUNT=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_REPEAT_COUNT)))'\' >>$@+ endif diff --git a/gettext.c b/gettext.c index 7272771c8e..d71e394284 100644 --- a/gettext.c +++ b/gettext.c @@ -46,15 +46,13 @@ const char *get_preferred_languages(void) return NULL; } -#ifdef GETTEXT_POISON int use_gettext_poison(void) { static int poison_requested = -1; if (poison_requested == -1) - poison_requested = getenv("GIT_GETTEXT_POISON") ? 1 : 0; + poison_requested = git_env_bool("GIT_GETTEXT_POISON", 0); return poison_requested; } -#endif #ifndef NO_GETTEXT static int test_vsnprintf(const char *fmt, ...) diff --git a/gettext.h b/gettext.h index 7eee64a34f..4c492d9f57 100644 --- a/gettext.h +++ b/gettext.h @@ -41,11 +41,7 @@ static inline int gettext_width(const char *s) } #endif -#ifdef GETTEXT_POISON extern int use_gettext_poison(void); -#else -#define use_gettext_poison() 0 -#endif static inline FORMAT_PRESERVING(1) const char *_(const char *msgid) { diff --git a/po/README b/po/README index fef4c0f0b5..9dad277e33 100644 --- a/po/README +++ b/po/README @@ -289,16 +289,11 @@ something in the test suite might still depend on the US English version of the strings, e.g. to grep some error message or other output. -To smoke out issues like these Git can be compiled with gettext poison -support, at the top-level: +To smoke out issues like these Git tested with a translation mode that +emits gibberish on every call to gettext. To use it run the test suite +with it, e.g.: - make GETTEXT_POISON=YesPlease - -That'll give you a git which emits gibberish on every call to -gettext. It's obviously not meant to be installed, but you should run -the test suite with it: - - cd t && prove -j 9 ./t[0-9]*.sh + cd t && GIT_GETTEXT_POISON=true prove -j 9 ./t[0-9]*.sh If tests break with it you should inspect them manually and see if what you're translating is sane, i.e. that you're not translating diff --git a/t/README b/t/README index 8847489640..b3b2fa0b11 100644 --- a/t/README +++ b/t/README @@ -301,6 +301,11 @@ that cannot be easily covered by a few specific test cases. These could be enabled by running the test suite with correct GIT_TEST_ environment set. +GIT_GETTEXT_POISON=<boolean> turns all strings marked for translation +into gibberish. Used for spotting those tests that need to be marked +with a C_LOCALE_OUTPUT prerequisite when adding more strings for +translation. See "Testing marked strings" in po/README for details. + GIT_TEST_SPLIT_INDEX=<boolean> forces split-index mode on the whole test suite. Accept any boolean values that are accepted by git-config. diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 6b5836dc1b..3e75672a37 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -19,6 +19,7 @@ static struct test_cmd cmds[] = { { "dump-untracked-cache", cmd__dump_untracked_cache }, { "example-decorate", cmd__example_decorate }, { "genrandom", cmd__genrandom }, + { "gettext-poison", cmd__gettext_poison }, { "hashmap", cmd__hashmap }, { "index-version", cmd__index_version }, { "json-writer", cmd__json_writer }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index e4890566da..04f033b7fc 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -15,6 +15,7 @@ int cmd__dump_split_index(int argc, const char **argv); int cmd__dump_untracked_cache(int argc, const char **argv); int cmd__example_decorate(int argc, const char **argv); int cmd__genrandom(int argc, const char **argv); +int cmd__gettext_poison(int argc, const char **argv); int cmd__hashmap(int argc, const char **argv); int cmd__index_version(int argc, const char **argv); int cmd__json_writer(int argc, const char **argv); diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 78d8c3783b..4e8683addd 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -755,16 +755,16 @@ test_cmp_bin() { # Use this instead of test_cmp to compare files that contain expected and # actual output from git commands that can be translated. When running -# under GETTEXT_POISON this pretends that the command produced expected +# under GIT_GETTEXT_POISON this pretends that the command produced expected # results. test_i18ncmp () { - test -n "$GETTEXT_POISON" || test_cmp "$@" + ! test_have_prereq C_LOCALE_OUTPUT || test_cmp "$@" } # Use this instead of "grep expected-string actual" to see if the # output from a git command that can be translated either contains an # expected string, or does not contain an unwanted one. When running -# under GETTEXT_POISON this pretends that the command produced expected +# under GIT_GETTEXT_POISON this pretends that the command produced expected # results. test_i18ngrep () { eval "last_arg=\${$#}" @@ -779,7 +779,7 @@ test_i18ngrep () { error "bug in the test script: too few parameters to test_i18ngrep" fi - if test -n "$GETTEXT_POISON" + if ! test_have_prereq C_LOCALE_OUTPUT then # pretend success return 0 diff --git a/t/test-lib.sh b/t/test-lib.sh index 897e6fcc94..f82d2149a2 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -105,6 +105,7 @@ unset VISUAL EMAIL LANGUAGE COLUMNS $("$PERL_PATH" -e ' TRACE DEBUG TEST + GETTEXT_POISON .*_TEST PROVE VALGRIND @@ -1104,15 +1105,8 @@ test -n "$USE_LIBPCRE1" && test_set_prereq LIBPCRE1 test -n "$USE_LIBPCRE2" && test_set_prereq LIBPCRE2 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT -# Can we rely on git's output in the C locale? -if test -n "$GETTEXT_POISON" -then - GIT_GETTEXT_POISON=YesPlease - export GIT_GETTEXT_POISON - test_set_prereq GETTEXT_POISON -else - test_set_prereq C_LOCALE_OUTPUT -fi +test_lazy_prereq GETTEXT_POISON 'test-tool gettext-poison' +test_lazy_prereq C_LOCALE_OUTPUT '! test-tool gettext-poison' if test -z "$GIT_TEST_CHECK_CACHE_TREE" then