On Thu, Mar 31 2022, Johannes Sixt wrote: > Am 31.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason: >> I do have some WIP changes to tear down most of the *.sh and *.perl i18n >> infrastructure (the parts still in use would still have translations), >> and IIRC it's at least a 2k line negative diffstat, and enables us to do >> more interesting things in i18n (e.g. getting rid of the libintl >> dependency). > > Why? Why? Why? Does the status quo have a problem somewhere? All this > sounds like a change for the sake of change. So this is quite the digression, but, hey, you asked for it. We don't have translations universally available because libintl is a rather heavy thing to ship. I don't personally mind linking against it for my own builds, but grep for NO_GETTEXT in our tree & history for some of the workarounds. We're also heading towards being able to build a stand-alone git binary for most things, which makes shipping in various setups much easier, but libintl is more of an "old-school" *nix library. You need to ferry around auxilliary *.mo files, and for the *.sh and *.perl translations we need gettext.sh, /usr/bin/gettext and Locale::Messages (and everything that brings in). I'd like translations for Git to Just Work, including if you're in some random docker image with someone's home-built git. Giving people fewer reasons to enable it improves accessibility. A lot of people who use git are not on their own personal laptop, but on some setup (corporate, CI etc.) that they don't fully control. The gettext model & libintl is also just bad at various use-cases I think would make sense to support. E.g. having a configurable option to emit output in two languages at the same time, either because you'd both like to understand the output & e.g. search errors online, or you'd understand more from a union of say German an English than from just one or the other. For libintl you need'd to juggle setlocale() in the middle of your underlying sprintf implementation to do that, or pull other shenanigans of bypassing its API (e.g. directly reading the *.mo files), which pretty much amounts to the same thing. So essentially I wanted to hack up something that would just post-process output like this: msgunfmt --strict -s -w 0 -i -E --color=always po/build/locale/de/LC_MESSAGES/git.mo And turn it into a lang-de.c file, for which we'd make a lang-de.o that we'd link in. And then either binary search through it, or just generate code we'd compile (one really big switch/case statement). Now, if you count the number of messages we translate in *.sh land on your digits you won't even need to use all of our toes, and for the *.perl it's similar, especially with add--interactive.perl going away any day now. There isn't any fundamental obstacle to making such a thing portable to *.sh and *.perl, but having gotten that particular interop working once in the past needing to do that again would bring this (I think worthwhile) project from a "maybe someday" to "nah". >> But I also don't think that such a series is probably not possible in >> the near term if we're going to insist that all shellscript output must >> byte-for-byte be the same (for boring reasons I won't go into, but it's >> mainly to do with sh-i18n--envsubst.c). > > Such an insistence can easily be lifted if the change is justified > sufficiently. I haven't seen such a justification, yet. Sure, but re the "chicken & egg" problem I might do all the work to do all that, and someone such as yourself might rightly point out that it would break someone's obscure use-case, scuttling the whole thing. Which isn't an exaggeration b.t.w., if you e.g. look through our remaining gettext.sh usage you'll find that we carry the entirety of sh-i18n--ensubst.c and everything around it (at least ~1k lines) for emitting a single word in a single message in git-sh-setup.sh, that's it. Because the whole reason eval_gettext exists, and everything to support it, is to support the use-case of feeding *arbitrary input* into the translation engine, i.e. not the string you yourself have in your source code & trust (it avoids shell "eval"). But if you think that's of paramount importance (that word is "usage" b.t.w., and the equivalent in usage.c isn't even translated) there wouldn't be any way to make forward progress towards the next step of making the remaining shellscript translations call some "git sh--i18n" helper to get their output. So, to the extent that I was going to pursue the above plan at all I wanted to do it in small steps, especially now as git-submodule.sh et al are going away. So. It would be nice to get some leeway in some areas, especially for something like this where I implemented this entire i18n system to begin with, so I'd think it would be clear that it's not some drive-by contribution. I clearly care about the end-goal, and have been sticking with this particular topic for more than a decade. Not everything can always be a single atomically understood patch that carries all possible reasons to make the change with it, some things are more of a longer term incremental effort. And since we all have limited time on this spinning ball of mud sometimes it can make sense to trickle in initial changes to see if some larger end-goal is even attainable, or will be blocked due to some unforeseen (or underestimated) reasons. Thanks.