Re: Regression: Ctrl-c from the pager in an alias exits it

Jeff King <peff@xxxxxxxx> · Fri, 6 Jan 2017 14:41:15 -0500

On Fri, Jan 06, 2017 at 03:39:59PM +0100, Johannes Sixt wrote:

> > diff --git a/run-command.c b/run-command.c
> > index ca905a9e80..db47c429b7 100644
> > --- a/run-command.c
> > +++ b/run-command.c
> > @@ -29,6 +29,8 @@ static int installed_child_cleanup_handler;
> > 
> >  static void cleanup_children(int sig, int in_signal)
> >  {
> > +	struct child_to_clean *children_to_wait_for = NULL;
> > +
> >  	while (children_to_clean) {
> >  		struct child_to_clean *p = children_to_clean;
> >  		children_to_clean = p->next;
> > @@ -45,6 +47,17 @@ static void cleanup_children(int sig, int in_signal)
> >  		}
> > 
> >  		kill(p->pid, sig);
> > +		p->next = children_to_wait_for;
> > +		children_to_wait_for = p;
> > +	}
> > +
> > +	while (children_to_wait_for) {
> > +		struct child_to_clean *p = children_to_wait_for;
> > +		children_to_wait_for = p->next;
> > +
> > +		while (waitpid(p->pid, NULL, 0) < 0 && errno == EINTR)
> > +			; /* spin waiting for process exit or error */
> > +
> >  		if (!in_signal)
> >  			free(p);
> >  	}
> > 
> 
> This looks like the minimal change necessary. I wonder, though, whether the
> new local variable is really required. Wouldn't it be sufficient to walk the
> children_to_clean chain twice?

Yeah, I considered that. The fact that we disassemble the list in the
first loop has two side effects:

  1. It lets us free the list as we go (for the !in_signal case).

  2. If we were to get another signal, it makes us sort-of reentrant. We
     will only kill and wait for each pid once.

Obviously (1) moves down to the lower loop, but I was trying to preserve
(2). I'm not sure if it is worth bothering, though. The way we pull
items off of the list is certainly not atomic (it does shorten the race
to a few instructions, though, versus potentially waiting on waitpid()
to return).

My bigger concern with the whole thing is whether we could hit some sort
of deadlock if the child doesn't die when we send it a signal. E.g.,
imagine we have a pipe open to the child and somebody sends SIGTERM to
us. We propagate SIGTERM to the child, and then waitpid() for it. The
child decides to ignore our SIGTERM for some reason and keep reading
until EOF on the pipe. It won't ever get it, and the two processes will
hang forever.

You can argue perhaps that the child is broken in that case. And I doubt
this could trigger when running a git sub-command. But we may add more
children in the future. Right now we use it for the new multi-file
clean/smudge filters. They use the hook feature to close the
descriptors, but note that that won't run in the in_signal case.

So I dunno. Maybe this waiting should be restricted only to certain
cases like executing git sub-commands.

-Peff