On Wed, Apr 20, 2022 at 9:42 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Phillip Wood <phillip.wood123@xxxxxxxxx> writes: > > >> It *is* true that run-command.c:pp_start_one() sets child_process:err=-1 > >> for the child and run-command.c:run_hook_ve() didn't do that; that -1 > >> means that start_command() will create a new fd for the child's stderr. > >> Since run_hook_ve() didn't care about the child's stderr before, I > >> wonder if that is why? Could it be that now that we're processing the > >> child's stderr, the child no longer thinks stderr is in tty, because the > >> parent is consuming its output? > > > > Exactly, stderr is redirected to a pipe so that we can buffer the > > output from each process and then write it to the real stdout when the > > process has finished to avoid the output from different processes > > getting mixed together. Ideally in this case we'd see that stdout is a > > tty and create a pty rather than a pipe when buffering the output from > > the process. > > Ah, thanks, and sigh. That means this was an unintended regression > caused by use of parallel infrastructure, mixed with a bit of "the > original problem report wrote hook properly so that when it is not > connected to a terminal (such as in this new implementation) it > refrains to do terminal-y things like coloring, so everything is > working as intended" ;-). > > IIRC, the parallel subprocess stuff was invented to spawn multiple > tasks we internally need (like "checkout these submodules") that are > not interactive (hence does not need access to stdin) en masse, and > the output buffering is there to avoid interleaving the output that > would make it unreadable. > > Use of the parallel subprocess API means that we inherently cannot > give access to the standard input to the hooks. The users of the > original run_hooks_ve() API would be OK with that, because it did > .no_stdin=1 before the problematic hooks API rewrite, but I wonder > what our plans should be for hooks that want to go interactive. > They could open /dev/tty themselves (and that would have been the > only way to go interactive even in the old world order, so it is > perfectly acceptable to keep it that way with .no_stdin=1), but if > they run in parallel, the end-user would not know whom they are > typing to (and which output lines are the prompts they are expected > to respond to). > > In the longer term, there are multiple possible action items. > > * We probably would want to design a bit better anti-interleaving > machinery than "buffer everything and show only after the process > exists", if we want to keep using the parallel subprocess API. > And that would help the original "do this thing in multiple > submodules at the same time" use case, too. I've noticed this too, but for very noisy things which are parallelized, I'm not sure a better user experience is possible. I suppose we could pick the "first" job in the task queue and print that output as it comes in, so that users are aware that *something* is happening? [job 0 starts] [job 1 starts] job 0 says 0-foo [job 1 says 1-foo, but it's buffered] job 0 says 0-bar [job 1 says 1-bar, but it's buffered] [job 0 finishes] [we replay the buffer from job 1 so far:] job 1 says 1-foo job 1 says 1-bar job 1 says 1-baz [job 1 finishes] I think it could be possible, but then job 1 still will never learn that it's a tty, because it's being buffered to prevent interleaving, even if we have the illusion of non-buffering. > > * We should teach hooks API to make it _optional_ to use the > parallel subprocess API. If we are not spawning hooks in > parallel today, there is no reason to incur this regression by > using the parallel subprocess API---this was a needress bug, and > I am angry. To counter, I think that having hooks invoked via two different mechanisms depending on how many are provided or whether they are parallelized is a mess to debug and maintain. I still stand by the decision to use the parallel subprocess API, which I think was reasonable to expect to do the same thing when jobs=1, and I think we should continue to do so. It simplifies the hook code significantly. > > * the hooks API should learn a mechanism for multiple hooks to > coordinate their executions. Perhaps they indicate their > preference if they are OK to be run in parallel, and those that > want isolation will be run one-at-a-time before or after others > run in parallel, or something. There is such a mechanism for hooks overall, but not yet for individual hooks. I know we discussed it at length[1] before, and decided it would be okay to figure this out later on. I suppose "later on" may have come :) > > * The hooks API should learn a mechanism for us to tell what > execution environment they are in. Ideally, the hooks, if it is > sane to run under the parallel subprocess API, shouldn't have > been learning if they are talking to an interactive human user by > looking at isatty(), but we should have been explicitly telling > them that they are, perhaps by exporting an environment > variable. There may probably be more clue hooks writers want > other than "am I talking to human user?" that we would want to > enumerate before going this route. Hm. I was going to mention that Ævar and I discussed the possibility of setting an environment variable for hook child processes, telling them which hook they are being run as - e.g. "GIT_HOOK=prepare-commit-msg" - but I suppose that relying on that alone doesn't tell us anything about whether the parent is being run in tty. I agree it could be very useful to simply pass GIT_PARENT_ISATTY to hooks (and I suppose other child processes). Could we simply do that from start_command() or something else deep in run-command.h machinery? Then Anthony's use case becomes if [-t 1|| GIT_PARENT_ISATTY] ... and no need to examine Git version. - Emily 1: https://lore.kernel.org/git/20210527000856.695702-2-emilyshaffer%40google.com under "Parallelization with dependencies" (and preceding conversations)