Re: [PATCH v3 7/7] git-sh-setup: don't mark trees not used in-tree for i18n

Johannes Sixt <j6t@xxxxxxxx> · Sat, 2 Apr 2022 16:16:41 +0200

Am 02.04.22 um 12:44 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Thu, Mar 31 2022, Johannes Sixt wrote:
> 
>> Am 31.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>>> I do have some WIP changes to tear down most of the *.sh and *.perl i18n
>>> infrastructure (the parts still in use would still have translations),
>>> and IIRC it's at least a 2k line negative diffstat, and enables us to do
>>> more interesting things in i18n (e.g. getting rid of the libintl
>>> dependency).
>>
>> Why? Why? Why? Does the status quo have a problem somewhere? All this
>> sounds like a change for the sake of change.
> 
> So this is quite the digression, but, hey, you asked for it.

Oh, no, this is not a digression *at all*. What your write below is the
kind of text that is needed to judge the value of a change. Without it,
a change that does not have an otherwise obvious improvement[*] is just
for the change's sake.

[*] In my book, getting rid of a libintl dependency is not an obvious
improvement. I may be biased in this case, because that dependency was
never a problem for me. Might be because my personal builds all have
NO_GETTEXT set.

> We don't have translations universally available because libintl is a
> rather heavy thing to ship.
> 
> I don't personally mind linking against it for my own builds, but grep
> for NO_GETTEXT in our tree & history for some of the workarounds.
> 
> We're also heading towards being able to build a stand-alone git binary
> for most things, which makes shipping in various setups much easier, but
> libintl is more of an "old-school" *nix library.
> 
> You need to ferry around auxilliary *.mo files, and for the *.sh and
> *.perl translations we need gettext.sh, /usr/bin/gettext and
> Locale::Messages (and everything that brings in).
> 
> I'd like translations for Git to Just Work, including if you're in some
> random docker image with someone's home-built git. Giving people fewer
> reasons to enable it improves accessibility. A lot of people who use git
> are not on their own personal laptop, but on some setup (corporate, CI
> etc.) that they don't fully control.
> 
> The gettext model & libintl is also just bad at various use-cases I
> think would make sense to support.
> 
> E.g. having a configurable option to emit output in two languages at the
> same time, either because you'd both like to understand the output &
> e.g. search errors online, or you'd understand more from a union of say
> German an English than from just one or the other.
> 
> For libintl you need'd to juggle setlocale() in the middle of your
> underlying sprintf implementation to do that, or pull other shenanigans
> of bypassing its API (e.g. directly reading the *.mo files), which
> pretty much amounts to the same thing.
> 
> So essentially I wanted to hack up something that would just
> post-process output like this:
> 
>     msgunfmt --strict -s -w 0 -i -E --color=always po/build/locale/de/LC_MESSAGES/git.mo
> 
> And turn it into a lang-de.c file, for which we'd make a lang-de.o that
> we'd link in. And then either binary search through it, or just generate
> code we'd compile (one really big switch/case statement).
> 
> Now, if you count the number of messages we translate in *.sh land on
> your digits you won't even need to use all of our toes, and for the
> *.perl it's similar, especially with add--interactive.perl going away
> any day now.
> 
> There isn't any fundamental obstacle to making such a thing portable to
> *.sh and *.perl, but having gotten that particular interop working once
> in the past needing to do that again would bring this (I think
> worthwhile) project from a "maybe someday" to "nah".

Just to make it clear: I am totally neutral on your goal. It's on others
to tell whether this is worth doing.

>>> But I also don't think that such a series is probably not possible in
>>> the near term if we're going to insist that all shellscript output must
>>> byte-for-byte be the same (for boring reasons I won't go into, but it's
>>> mainly to do with sh-i18n--envsubst.c).
>>
>> Such an insistence can easily be lifted if the change is justified
>> sufficiently. I haven't seen such a justification, yet.
> 
> Sure, but re the "chicken & egg" problem I might do all the work to do
> all that, and someone such as yourself might rightly point out that it
> would break someone's obscure use-case, scuttling the whole thing.
> 
> Which isn't an exaggeration b.t.w., if you e.g. look through our
> remaining gettext.sh usage you'll find that we carry the entirety of
> sh-i18n--ensubst.c and everything around it (at least ~1k lines) for
> emitting a single word in a single message in git-sh-setup.sh, that's
> it.

See, someone thought it was a good idea to have i18n in shell scripts
and others agreed that it was worth having ~1k lines of code to do that.
So the code went in. From then on, these ~1k lines are *not a problem*
in themselves. From then on, the decision of having ~1k lines or not
having them can only be based on what effect they have, but no longer on
"oh, wow, that's 1k lines to write a single word; do we really want that"?

> 
> Because the whole reason eval_gettext exists, and everything to support
> it, is to support the use-case of feeding *arbitrary input* into the
> translation engine, i.e. not the string you yourself have in your source
> code & trust (it avoids shell "eval").
> 
> But if you think that's of paramount importance (that word is "usage"
> b.t.w., and the equivalent in usage.c isn't even translated) there
> wouldn't be any way to make forward progress towards the next step of
> making the remaining shellscript translations call some "git sh--i18n"
> helper to get their output.
> 
> So, to the extent that I was going to pursue the above plan at all I
> wanted to do it in small steps, especially now as git-submodule.sh et al
> are going away.
> 
> So.
> 
> It would be nice to get some leeway in some areas, especially for
> something like this where I implemented this entire i18n system to begin
> with, so I'd think it would be clear that it's not some drive-by
> contribution. I clearly care about the end-goal, and have been sticking
> with this particular topic for more than a decade.
> 
> Not everything can always be a single atomically understood patch that
> carries all possible reasons to make the change with it, some things are
> more of a longer term incremental effort.
> 
> And since we all have limited time on this spinning ball of mud
> sometimes it can make sense to trickle in initial changes to see if some
> larger end-goal is even attainable, or will be blocked due to some
> unforeseen (or underestimated) reasons.

You can't have leeway for a change whose conclusion is "removes some
miniscule feature". But if you add "Here is the secret plan to Scrat's
golden nut; let's start with this change, even though it removes some
miniscule feature", things are vastly different.

-- Hannes