Re: [PATCH v3 7/7] git-sh-setup: don't mark trees not used in-tree for i18n

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 31 2022, Phillip Wood wrote:

[tl;dr: Reply below, but this whole thing should be addressed by the v4
I sent last night:
https://lore.kernel.org/git/cover-v4-0.6-00000000000-20220331T014349Z-avarab@xxxxxxxxx/

I.e. the controversial patch has been ejected].

> On 28/03/2022 13:16, Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Mar 28 2022, Johannes Sixt wrote:
>> 
>>> Am 27.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>>>>
>>>> On Sun, Mar 27 2022, Johannes Sixt wrote:
>>>>
>>>>> Am 26.03.22 um 18:14 schrieb Ævar Arnfjörð Bjarmason:
>>>>>> Partially revert d323c6b6410 (i18n: git-sh-setup.sh: mark strings for
>>>>>> translation, 2016-06-17).
>>>>>>
>>>>>> These strings are no longer used in-tree, and we shouldn't be wasting
>>>>>> translator time on them for the benefit of a hypothetical out-of-tree
>>>>>> user of git-sh-setup.sh.
>
> The out of tree users of git-sh-setup.sh are not hypothetical, they
> exist and objected when you recently tried to remove these functions 
> entirely[1].

I see that what I wrote there is ambiguous, but I'm aware of that &
remember that thread. I meant to say the hypothetical user that cares
about the i18n these functions exposed.

>>>>> There is public documentation for these functions. Hence, you must
>>>>> assume that they are used in scripts outside of Git. Castrating their
>>>>> functionality like this patch does is unacceptable.
>>>>
>>>> For require_clean_work_tree() the public documentation for this function
>>>> states that it will emit a specific error message in English, and you're
>>>> expected to Lego-interpolate a string that we'll concatenate with it:
>
> The documentation does not say whether the message is translated or
> not, probably because it was not updated when the translations were
> added six years ago.

It does say it. It uses the word "Cannot" at the beginning, and promises
to emit that specific string.

Yes we didn't update it at the time for i18n, and probably should.

But to the extent that the gordian knot in making any changes to these
whatsoever is because they've been publicly documented I don't think
anyone using these has been promised different behavior. So it's highly
relevant here.

>>>> 	[...]It emits an error message of the form `Cannot
>>>>          <action>: <reason>. <hint>`, and dies.  Example:
>
> This is not a promising a "specific error message in English"

It really is. You cannot use this API to produce sensible output in any
other language. It was used like this:

    require_clean_work_tree "pull with rebase" "Please commit or stash them."

For which we'd emit:

    Cannot pull with rebase: You have unstaged changes.

    Please commit or stash them.

You can see e.g. in the Bulgarian translation that this was dealt with
by putting the interpolated string in double-quotes.

>>>> 	+
>>>> 	----------------
>>>> 	require_clean_work_tree rebase "Please commit or stash them."
>
> This is an example message you cannot use that to argue that we will
> always show a message in English

I'm saying that the documentation says it emits English, that it didn't
always do that, and now does so again.

And that to get it to emit anything sensible in cases where we're not
under LC_ALL=C would have required 1=1 matching the behavior of whatever
shellscript is using this to what git-sh-i18n in picking the locale.

I don't think it's plausible that there's an out-of-tree user
maintaining their own set of i18n'd po/ files which expect to interact
with our translations in this way.

Any out-of-tree user of this (if they're using this at all) will either
not care, or they'll see more sensible output again.

>>>> So I think that marking it for translation like this as d323c6b6410 was
>>>> always broken in that it broke that documented promise.
>>>
>>> I can buy this argument. But then this must be a separate commit with
>>> this justification.
>> Sure, I can elaborate on that point & split it up.
>> 
>>>> But that's just an argument for keeping the require_clean_work_tree()
>>>> part of this patch, not require_work_tree_exists().
>>>>
>>>> For that one and others in git-sh-setup we've never said that we'd
>>>> provide these translated (and to the extent we've implied anything about
>>>> the rest, have strongly implied the opposite with
>>>> require_clean_work_tree()'s docs).
>>>>
>>>> Nothing will break for out-of-tree users as a result of this
>>>> change.
>
> The strings the user sees will change

Yes, and I'll admit that "nothing will break here" on my part isn't the
same as saying "there will be no observable change whatsoever". Sorry
about being unclear there.

As a general matter we don't promise that such strings won't change,
even for die(), error() etc. messages emitted by plumbing commands.

Except in some rare cases where they've been known to be used out of
tree extensively, e.g. the human-readable "merge" messages where we
have/had no other API to expose the same information.

Or, in the case of plumbing output where such strings are part of the
API contract.

But for these commands in the "Internal helper commands" category I
think this fall squarely in the category of changing a random error(),
die() etc. in the C code (which we do quite freely).

>>>> When we added these functions and their documentation their
>>>> output wouldn't be translated,
>
> Where does the documentation say "the output will not be translated"?

I think this was covered above, it's sufficient that it didn't promise
that it would be, and in the one case where we discuss it in passing
with an example we imply that it won't be.

>>>> then sometimes it was, now it's not
>>>> again.
>>>
>>> This does not sound convincing at all, but rather like "I want the code
>>> to be so, and here is some handwaving to justify it". What is wrong with
>>> the status quo?
>> The larger context for why I was looking at this again is that I'm
>> trying to slowly get us to the point where we can remove the
>> i18n-in-shellscript entirtely.
>> But I thought that was a rather large digression for the commit
>> message,
>> and that these being both unused, and not something the "public" API
>> affected ever promised it would do was sufficient.
>
> I think if that is what you want to do then you should propose a
> series that does just that and explains why it is desirable, rather
> than coming up with other reasons to justify the changes you want.

Just because I start looking at some code for reason X that doesn't mean
that submitting a patch with rationale Y isn't a sufficient reason to
make that change.

I still think that in this case that they're not used by our own i18n
effort is a perfectly sufficient reason to make the change, as we won't
waste translator time in it. I.e. I'll still stand behind the stated
rationale.

But aside from that most changes I made to git are with an eye to some
larger semi-related goal.

I do have some WIP changes to tear down most of the *.sh and *.perl i18n
infrastructure (the parts still in use would still have translations),
and IIRC it's at least a 2k line negative diffstat, and enables us to do
more interesting things in i18n (e.g. getting rid of the libintl
dependency).

But I also don't think that such a series is probably not possible in
the near term if we're going to insist that all shellscript output must
byte-for-byte be the same (for boring reasons I won't go into, but it's
mainly to do with sh-i18n--envsubst.c).

So it's also a bit of a chicke & egg problem. I wanted to send any such
UI changes in first, to see if it was even worth finishing up that work,
or if the whole thing would stall on not being able to change some
output someone somewhere might have relied on being byte-for-byte the
same.

>>>> We need also need to be mindful of translator time, it's a *lot* of
>>>> strings to go through, and e.g. I've commented in the past on patches
>>>> that marked stuff in t/helper/ for translation.
>>>
>>> Translator's time is your concern? No translator sifts through 5000
>>> strings on every release. There are tools that show only new strings to
>>> translate.
>> Yes, I'm the person who added this entire i18n infrastructure to
>> git, I
>> know how it works :)
>> 
>>> A text is translated once and then it lies under the radar
>>> until someone changes it. Don't tell me that is time-consuming.
>> Yes, individual orphaned strings aren't, but they add up.
>> Just like having that "USE_PIC" comment in configure.ac isn't much
>> of a
>> big deal, but it makes sense to clean up unused code, just as we're
>> adding new code.
>> I will say that your implicit proposal of keeping this forever
>> instead
>> is assuming that we won't have more translations for git, and every new
>> translator will look at this.
>> Context is critical for translators, so even if it's one string it's
>> a
>> string you'll quickly grep for and find ... no uses for, and then likely
>> go hunting around for where it's used only to (hopefully, in that case)
>> find this thread. Better not to have it.
>> 
>>> On the other hand, there is a lot of *reviewer* time that you are
>>> spending with changes like this. *That* should be your concern.
>> I'd think most of the that time, if any, will be spent on this
>> sub-thread you started, so ... :)
>
> This sub-tread exists because you posted this patch to the mailing
> list. Blaming reviewers for asking perfectly reasonable questions is
> neither fair nor helpful.

I didn't mean any offense there, but did mean to suggest (smiley an all)
that a mountain was being made out a molehill in this case.

Yes translator time is my concern. I started the i18n effort in git, and
I think it's really important. We currently have 18 translations of git
in the po/ directory, 16 if you leave out "dialects". Which if you
compare it with
https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers
is quite bad.

For comparison I worked extensively on MediaWiki in a past life, which
at the time had at least 100 such translations. I looked again and it's
up to around 600 (many incomplete, to be fair).

Is that our fault as project? No, but we could definitely help it along.

I value the scarcity of translator time (including future translations)
much more than concerns that there *may be* someone somewhere who's got
a reliance on this particular output.

> This patch does not remove dead code as the rest of the series does
> but instead changes user facing messages in code that we recently 
> established is part of the public api[2]. Nothing has changed since
> that recent discussion so I'm confused as to why you are proposing to
> modify the api again so soon.

As noted above I don't think that previous discussion applies to these
changes as you describe, but in any case, ~8 hours before you sent this
reply I sent a v4 re-roll which left out this change:

    https://lore.kernel.org/git/cover-v4-0.6-00000000000-20220331T014349Z-avarab@xxxxxxxxx/

Which I hope will address your & Johannes Sixt's concerns here. Does the
rest of this series look good to you?




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux