On Wed, Apr 26, 2017 at 11:17 AM, Michael J Gruber <git@xxxxxxxxx> wrote: > [Turns out I still can't operate gmail's web interface. Sorry for the dupe.] > > 2017-04-24 13:04 GMT+02:00 Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>: >> Remove the GETTEXT_POISON=YesPlease compile-time which turns all of >> git's LC_*=C output into strings like "# GETTEXT POISON #" instead of >> gettext(msgid). >> >> See commit bb946bba76 ("i18n: add GETTEXT_POISON to simulate >> unfriendly translator", 2011-02-22) for what this was originally >> intended for. >> >> This facility has been broken for quite a while and has been subjected >> to frequent bitrot. The initial idea behind it back when it was added >> in 2011 was to prevent the accidental translation of plumbing >> messages. >> >> This isn't a big concern anymore as git isn't mass-adding i18n >> messages for a newly developed i18n facility as it was back then, >> maintaining this facility incurs a burden, and in actuality this has >> often been broken long enough for potential plumbing messages to be >> translated & make their way into major releases anyway. >> >> Most of this patch consists of search/replacing the test suite for: >> >> test_i18ngrep ! -> ! grep >> test_i18ngrep -> grep >> test_i18ncmp -> test_cmp >> >> 1. <AANLkTi=5MrU-JyeQ3UVNbVwzn-8FbstUXafgcQaLWXDB@xxxxxxxxxxxxxx> >> (https://public-inbox.org/git/AANLkTi=5MrU-JyeQ3UVNbVwzn-8FbstUXafgcQaLWXDB@xxxxxxxxxxxxxx/) >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> >> --- >> >> On Mon, Apr 24, 2017 at 3:18 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> Michael J Gruber <git@xxxxxxxxx> writes: >>> >>>> Ævar Arnfjörð Bjarmason venit, vidit, dixit 20.04.2017 23:58: >>>>> As a refresh of everyone's memory (because mine needed it). This is a >>>>> feature I added back in 2011 when the i18n support was initially >>>>> added. >>>>> >>>>> There was concern at the time that we would inadvertently mark >>>>> plumbing messages for translation, particularly something in a shared >>>>> code path, and this was a way to hopefully smoke out those issues with >>>>> the test suite. >>>>> >>>>> However compiling with it breaks a couple of dozen tests, I stopped >>>>> digging when I saw some broke back in 2014. >>>>> >>>>> What should be done about this? I think if we're going to keep them >>>>> they need to be run regularly by something like Travis (Lars CC'd), >>>>> however empirical evidence suggests that not running them is just fine >>>>> too, so should we just remove support for this test mode? >>>>> >>>>> I don't care, but I can come up with the patch either way, but would >>>>> only be motivated to write the one-time fix for it if some CI system >>>>> is actually running them regularly, otherwise they'll just be subject >>>>> to bitrotting again. >>>> >>>> I use that switch when I change something that involves l10n, but >>>> usually I run specific tests only. To be honest: I have to make sure not >>>> to get confused by (nor forget one of) the build flag GETTEXT_POISON and >>>> the environment variable GIT_GETTEXT_POISON. I'm not sure I always >>>> tested what I meant to test... >>> >>> To be quite honest, I have always felt that we are just as likely >>> inadvertently use test_i18ncmp when we should use test_cmp (and vice >>> versa) as we would mark plumbing messages with _() by mistake with >>> this approach, and even with constant monitoring by something like >>> Travis, GETTEXT_POISON may be able to catch mistakes only some of >>> the time (i.e. when we do not make mistakes in writing our tests). >>> Without constant monitoring, I agree that the mechanism does not >>> work well to catch our mistakes. >> >> Here's an alternate patch to just remove it entirely. I think we >> should apply this instead, the only reason I sent the patch to fix it >> up was because of Michael's comment that he was occasionally using it. > > Yes, I think test_i18ngrep and test_i18ncmp gave the impression that > they are i18n-aware grep and cmp, whereas in fact they turned off > these tese test lines completely. > Combined with the fact that GETTEXT_POSON builds turned on > GIT_GETTEXT_POISON, this sounds somewhat dangerous - we test more > aspects of plumbing commands but turn off (some) tests for porcelain. It gives us no less test coverage because we still run the tests without GETTEXT_POSON. Thus running first without that & then with gives 100% coverage. The entire point of the GETTEXT_POSON mode is that we've already manually gone over things that broke in the past, and are skipping them with C_LOCALE_OUTPUT or one of these i18n functions, and when adding new translations we keep an eye out for new breakages that haven't been marked yet. These are then a candidate either for marking to skip, if they're porcelain commands using our new gettext()-utilizing code, or we've made a mistake and translated some plumbing, in which case that would hopefully be caught by list review or Travis. I still think it's better to just get rid of it, but this aspect of it is working exactly as intended. > I'm still wondering wether we couldn't generate a test locale > automatically by mangling the english strings (but preserving format > specifiers). That way, tests that test porcelain output could require > LANG=C while others could run in the mangled locale. (tt_TT is taken, > though ;) ) If we wanted to keep GETTEXT_POSON this would be going further in the wrong direction. The reason the message starts with a "#", has no trailing \n & most importantly ignores any format specifiers the real translation would have is exactly because we'd like as many things as can break break under this mode. If anything we should consider changing "# GETTEXT_POISON #" to "<random binary garbage>" if we keep it. Including format specifiers would just reduce the set of things that would break, consider e.g. a test that merely greps for some substring that would be emitted by a "%s branch %d commits ahead" format, now we change that to "%s GIBBER %d ISH", but the test still passes because it just does $(awk '{print $1}'), but it might be a plumbing message that could be translated as e.g. "%d WORD1 %s WORD2 WORD3", breaking the API.