Re: git 2.36.0 regression: pre-commit hooks no longer have stdout/stderr as tty

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 20 Apr 2022 09:42:01 -0700

Phillip Wood <phillip.wood123@xxxxxxxxx> writes:

>> It *is* true that run-command.c:pp_start_one() sets child_process:err=-1
>> for the child and run-command.c:run_hook_ve() didn't do that; that -1
>> means that start_command() will create a new fd for the child's stderr.
>> Since run_hook_ve() didn't care about the child's stderr before, I
>> wonder if that is why? Could it be that now that we're processing the
>> child's stderr, the child no longer thinks stderr is in tty, because the
>> parent is consuming its output?
>
> Exactly, stderr is redirected to a pipe so that we can buffer the
> output from each process and then write it to the real stdout when the
> process has finished to avoid the output from different processes
> getting mixed together. Ideally in this case we'd see that stdout is a
> tty and create a pty rather than a pipe when buffering the output from
> the process.

Ah, thanks, and sigh.  That means this was an unintended regression
caused by use of parallel infrastructure, mixed with a bit of "the
original problem report wrote hook properly so that when it is not
connected to a terminal (such as in this new implementation) it
refrains to do terminal-y things like coloring, so everything is
working as intended" ;-).

IIRC, the parallel subprocess stuff was invented to spawn multiple
tasks we internally need (like "checkout these submodules") that are
not interactive (hence does not need access to stdin) en masse, and
the output buffering is there to avoid interleaving the output that
would make it unreadable.

Use of the parallel subprocess API means that we inherently cannot
give access to the standard input to the hooks.  The users of the
original run_hooks_ve() API would be OK with that, because it did
.no_stdin=1 before the problematic hooks API rewrite, but I wonder
what our plans should be for hooks that want to go interactive.
They could open /dev/tty themselves (and that would have been the
only way to go interactive even in the old world order, so it is
perfectly acceptable to keep it that way with .no_stdin=1), but if
they run in parallel, the end-user would not know whom they are
typing to (and which output lines are the prompts they are expected
to respond to).

In the longer term, there are multiple possible action items.

 * We probably would want to design a bit better anti-interleaving
   machinery than "buffer everything and show only after the process
   exists", if we want to keep using the parallel subprocess API.
   And that would help the original "do this thing in multiple
   submodules at the same time" use case, too.  

 * We should teach hooks API to make it _optional_ to use the
   parallel subprocess API.  If we are not spawning hooks in
   parallel today, there is no reason to incur this regression by
   using the parallel subprocess API---this was a needress bug, and
   I am angry.

 * the hooks API should learn a mechanism for multiple hooks to
   coordinate their executions.  Perhaps they indicate their
   preference if they are OK to be run in parallel, and those that
   want isolation will be run one-at-a-time before or after others
   run in parallel, or something.

 * The hooks API should learn a mechanism for us to tell what
   execution environment they are in.  Ideally, the hooks, if it is
   sane to run under the parallel subprocess API, shouldn't have
   been learning if they are talking to an interactive human user by
   looking at isatty(), but we should have been explicitly telling
   them that they are, perhaps by exporting an environment
   variable.  There may probably be more clue hooks writers want
   other than "am I talking to human user?" that we would want to
   enumerate before going this route.

Thanks for analyzing.