Am 06.01.2017 um 20:41 schrieb Jeff King:
On Fri, Jan 06, 2017 at 03:39:59PM +0100, Johannes Sixt wrote:
diff --git a/run-command.c b/run-command.c
index ca905a9e80..db47c429b7 100644
--- a/run-command.c
+++ b/run-command.c
@@ -29,6 +29,8 @@ static int installed_child_cleanup_handler;
static void cleanup_children(int sig, int in_signal)
{
+ struct child_to_clean *children_to_wait_for = NULL;
+
while (children_to_clean) {
struct child_to_clean *p = children_to_clean;
children_to_clean = p->next;
@@ -45,6 +47,17 @@ static void cleanup_children(int sig, int in_signal)
}
kill(p->pid, sig);
+ p->next = children_to_wait_for;
+ children_to_wait_for = p;
+ }
+
+ while (children_to_wait_for) {
+ struct child_to_clean *p = children_to_wait_for;
+ children_to_wait_for = p->next;
+
+ while (waitpid(p->pid, NULL, 0) < 0 && errno == EINTR)
+ ; /* spin waiting for process exit or error */
+
if (!in_signal)
free(p);
}
This looks like the minimal change necessary. I wonder, though, whether the
new local variable is really required. Wouldn't it be sufficient to walk the
children_to_clean chain twice?
Yeah, I considered that. The fact that we disassemble the list in the
first loop has two side effects:
1. It lets us free the list as we go (for the !in_signal case).
2. If we were to get another signal, it makes us sort-of reentrant. We
will only kill and wait for each pid once.
Obviously (1) moves down to the lower loop, but I was trying to preserve
(2). I'm not sure if it is worth bothering, though.
Makes sense.
The way we pull
items off of the list is certainly not atomic (it does shorten the race
to a few instructions, though, versus potentially waiting on waitpid()
to return).
My bigger concern with the whole thing is whether we could hit some sort
of deadlock if the child doesn't die when we send it a signal. E.g.,
imagine we have a pipe open to the child and somebody sends SIGTERM to
us. We propagate SIGTERM to the child, and then waitpid() for it. The
child decides to ignore our SIGTERM for some reason and keep reading
until EOF on the pipe. It won't ever get it, and the two processes will
hang forever.
You can argue perhaps that the child is broken in that case. And I doubt
this could trigger when running a git sub-command. But we may add more
children in the future. Right now we use it for the new multi-file
clean/smudge filters. They use the hook feature to close the
descriptors, but note that that won't run in the in_signal case.
So I dunno. Maybe this waiting should be restricted only to certain
cases like executing git sub-commands.
If given it some thought.
In general, I think it is wrong to wait for child processes when a
signal was received. After all, it is the purpose of a (deadly) signal
to have the process go away. There may be programs that know it better,
like less, but git should not attempt to know better in general.
We do apply some special behavior for certain cases like we do for the
pager. And now the case with aliases is another special situation. The
parent git process only delegates to the child, and as such it is
reasonable that it binds its life time to the first child, which
executes the expanded alias.
-- Hannes