Re: [PATCH v2 0/8] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 25 May 2022 15:00:52 +0200

On Wed, May 25 2022, Johannes Schindelin wrote:

> Hi Ævar,
>
> as promised in the Git IRC Standup [*1*], a review.
>
> On Wed, 18 May 2022, Ævar Arnfjörð Bjarmason wrote:
>
>> Ævar Arnfjörð Bjarmason (8):
>>   run-command tests: change if/if/... to if/else if/else
>>   run-command API: use "opts" struct for run_processes_parallel{,_tr2}()
>>   run-command tests: test stdout of run_command_parallel()
>>   run-command.c: add an initializer for "struct parallel_processes"
>>   run-command: add an "ungroup" option to run_process_parallel()
>>   hook tests: fix redirection logic error in 96e7225b310
>>   hook API: don't redundantly re-set "no_stdin" and "stdout_to_stderr"
>>   hook API: fix v2.36.0 regression: hooks should be connected to a TTY
>
> I started reviewing the patches individually, but have some higher-level
> concerns that put my per-patch review on hold.
>
> Keeping in mind that the intention is to fix a regression that was
> introduced by way of refactoring (most of our recent regressions seem to
> share that trait [*2*]), I strongly advise against another round of
> refactoring [*3*], especially against tying it to fix a regression.
>
> In this instance, it would be very easy to fix the bug without any
> refactoring. In a nutshell, the manifestation of the bug amplifies this
> part of the commit message of 96e7225b310 (hook: add 'run' subcommand,
> 2021-12-22):
>
>     Some of the implementation here, such as a function being named
>     run_hooks_opt() when it's tasked with running one hook, to using the
>     run_processes_parallel_tr2() API to run with jobs=1 is somewhere
>     between a bit odd and and an overkill for the current features of this
>     "hook run" command and the hook.[ch] API.
>
> It is this switch to `run_processes_parallel()` that is the root cause of
> the regression.

Yes, or more generally to the new hook API which makes use of it.

> The current iteration of the patch series does not fix that.

Because the plan is still to continue in this direction and go for
Emily's config-based hooks, which will run in parallel.

And to fix that would at this point be a larger functional change,
because we'd be running with more code we haven't tested before,
i.e. hook.[ch] on some new backend. So just passing down the appropriate
flags to have run-command.[ch] do the right thing for us seemed to be
the least bad option.

> In the commit message from which I quoted, the plan is laid out to
> eventually run more than one hook. If that is still the plan, we will be
> presented with the unfortunate choice to either never running them in
> parallel, or alternatively reintroducing the regression where the hooks
> run detached from stdin/stdout/stderr.

No, because you can have N processes all connected to a terminal with
"ungroup", what you can't do is guarantee that they won't interleave.

But as discussed in some previous threads that would be OK, since that
would come as an opt-in to parallel hook execution. I.e. you could pick
one of:

 1. Current behavior
 2. Our parallel hook execution (whatever "ungroup" etc. settings that entails)
 3. Your own parallel hook execution

It only matters that we don't regress in #1, for #2 we could have
different behavior, but just document the caveats as such.

IOW it's OK if you run parallel hooks and we decide that they won't be
connected to a terminal, because that's a new feature we don't have yet,
one you'd need to opt into.

> It is pretty clear that there is no actual choice, and the hooks will
> never be able to run in parallel. Therefore, the fix should move
> `run_hooks_opt()` away from calling `run_processes_parallel()`.

They will run in parallel, see above.

> In any case, regression fixes should not be mixed with refactorings unless
> the latter make the former easier, which is not the case here.

I noted upthread/side-thread (in any case, in discussions around this)
that I wished I'd come up with something smaller, but couldn't.

If you want to try your hand at that I'd love to see it.

But basically migrating the hook API to a new "backend" would also be a
large change, so would making the bare minumum change in
run-command.[ch].

But hey, I might be wrong. So if you think it's obvious that this could
be much smaller I'd love to see patches for it...

> Footnote *1*:
> https://colabti.org/irclogger/irclogger_log/git-devel?date=2022-05-23#l44
>
> Footnote *2*: I say "seem" because it would take a proper retro to analyze
> what was the reason for the uptick in regressions, and even more
> importantly to analyze what we can learn from the experience.

Yes, that might be interesting.

I'll only note that I think you're focusing on the wrong thing here with
"refactorings".

If you look at the history of this hooks API topic it started early on
with a version where the config-based hooks + parallelism (currently not
here yet) were combined with making the existing hook users use the new
API (partially here now).

Now, I suggested that be split up so that we'd first re-implement all
existing hooks on the new API, and *then* perform any feature changes.

Except of course by doing so that alters the nature of those changes in
your definition, I assume, i.e. it goes from a feature series to
"refactorings".

Whereas I think the important thing to optimize for is to make smaller
incremental changes. Here we had a bug, and it's relatively easy to fix
it, it would be much harder if we had a bigger delta in v2.36 with not
only this bug, but some other regressions.

Which isn't hypothetical b.t.w., until 3-4 months ago nobody had seen
that the config-based hooks topic we had kicking around had a severe
performance regression. I found it and Emily & I have been kicking
around a fix for it (mostly off-list).

But if we'd done that we'd have a more broken release, but we also
wouldn't have "refactorings". I.e. the run_parallel API would actually
be used, but we'd have this breakage plus some others.

Anyway, I think there's lots of things we could probably do better in
delivering more reliable software. I'm just pointing out that here that
I think focusing on a part of a larger progression from A..B and saying
that it refactored something as being bad is to make a categorical
mistake. Because a re-doing of that state to make each step not have any
of those would result in larger change deltas.

> Footnote *3*: The refactoring, as Junio suspected, might very well go a
> bit over board. Even if a new variation of the `run_processes_parallel()`
> function that takes a struct should be necessary, it would be easy -- and
> desirable -- to keep the current function signatures unchanged and simply
> turn them into shims that then call the new variant. That would make the
> refactoring much easier to review, and in turn it would make it less
> likely to introduce another regression.

Sure, we could instead add a third variant to it in addition to the two
on "master", instead of unifying them as is done here.

But per the v1 feedback the consensus seemed to be that this was a good
direction, and to the extent that there were objections it was that I
should add the rest of the arguments to the "opts" struct.

But again, I'm fully open to that, I tried that and didn't think the end
result was any simpler to review, but perhaps you'd like to try...