Re: [PATCH v3 7/7] git-sh-setup: don't mark trees not used in-tree for i18n

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sat, 02 Apr 2022 12:44:01 +0200

On Thu, Mar 31 2022, Johannes Sixt wrote:

> Am 31.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>> I do have some WIP changes to tear down most of the *.sh and *.perl i18n
>> infrastructure (the parts still in use would still have translations),
>> and IIRC it's at least a 2k line negative diffstat, and enables us to do
>> more interesting things in i18n (e.g. getting rid of the libintl
>> dependency).
>
> Why? Why? Why? Does the status quo have a problem somewhere? All this
> sounds like a change for the sake of change.

So this is quite the digression, but, hey, you asked for it.

We don't have translations universally available because libintl is a
rather heavy thing to ship.

I don't personally mind linking against it for my own builds, but grep
for NO_GETTEXT in our tree & history for some of the workarounds.

We're also heading towards being able to build a stand-alone git binary
for most things, which makes shipping in various setups much easier, but
libintl is more of an "old-school" *nix library.

You need to ferry around auxilliary *.mo files, and for the *.sh and
*.perl translations we need gettext.sh, /usr/bin/gettext and
Locale::Messages (and everything that brings in).

I'd like translations for Git to Just Work, including if you're in some
random docker image with someone's home-built git. Giving people fewer
reasons to enable it improves accessibility. A lot of people who use git
are not on their own personal laptop, but on some setup (corporate, CI
etc.) that they don't fully control.

The gettext model & libintl is also just bad at various use-cases I
think would make sense to support.

E.g. having a configurable option to emit output in two languages at the
same time, either because you'd both like to understand the output &
e.g. search errors online, or you'd understand more from a union of say
German an English than from just one or the other.

For libintl you need'd to juggle setlocale() in the middle of your
underlying sprintf implementation to do that, or pull other shenanigans
of bypassing its API (e.g. directly reading the *.mo files), which
pretty much amounts to the same thing.

So essentially I wanted to hack up something that would just
post-process output like this:

    msgunfmt --strict -s -w 0 -i -E --color=always po/build/locale/de/LC_MESSAGES/git.mo

And turn it into a lang-de.c file, for which we'd make a lang-de.o that
we'd link in. And then either binary search through it, or just generate
code we'd compile (one really big switch/case statement).

Now, if you count the number of messages we translate in *.sh land on
your digits you won't even need to use all of our toes, and for the
*.perl it's similar, especially with add--interactive.perl going away
any day now.

There isn't any fundamental obstacle to making such a thing portable to
*.sh and *.perl, but having gotten that particular interop working once
in the past needing to do that again would bring this (I think
worthwhile) project from a "maybe someday" to "nah".

>> But I also don't think that such a series is probably not possible in
>> the near term if we're going to insist that all shellscript output must
>> byte-for-byte be the same (for boring reasons I won't go into, but it's
>> mainly to do with sh-i18n--envsubst.c).
>
> Such an insistence can easily be lifted if the change is justified
> sufficiently. I haven't seen such a justification, yet.

Sure, but re the "chicken & egg" problem I might do all the work to do
all that, and someone such as yourself might rightly point out that it
would break someone's obscure use-case, scuttling the whole thing.

Which isn't an exaggeration b.t.w., if you e.g. look through our
remaining gettext.sh usage you'll find that we carry the entirety of
sh-i18n--ensubst.c and everything around it (at least ~1k lines) for
emitting a single word in a single message in git-sh-setup.sh, that's
it.

Because the whole reason eval_gettext exists, and everything to support
it, is to support the use-case of feeding *arbitrary input* into the
translation engine, i.e. not the string you yourself have in your source
code & trust (it avoids shell "eval").

But if you think that's of paramount importance (that word is "usage"
b.t.w., and the equivalent in usage.c isn't even translated) there
wouldn't be any way to make forward progress towards the next step of
making the remaining shellscript translations call some "git sh--i18n"
helper to get their output.

So, to the extent that I was going to pursue the above plan at all I
wanted to do it in small steps, especially now as git-submodule.sh et al
are going away.

So.

It would be nice to get some leeway in some areas, especially for
something like this where I implemented this entire i18n system to begin
with, so I'd think it would be clear that it's not some drive-by
contribution. I clearly care about the end-goal, and have been sticking
with this particular topic for more than a decade.

Not everything can always be a single atomically understood patch that
carries all possible reasons to make the change with it, some things are
more of a longer term incremental effort.

And since we all have limited time on this spinning ball of mud
sometimes it can make sense to trickle in initial changes to see if some
larger end-goal is even attainable, or will be blocked due to some
unforeseen (or underestimated) reasons.

Thanks.