On Fri, 04 Sep 2015 11:23:55 +0200, Jeff King wrote: > > On Fri, Sep 04, 2015 at 07:52:21AM +0200, Takashi Iwai wrote: > > > -- 8< -- > > From: Takashi Iwai <tiwai@xxxxxxx> > > Subject: [PATCH] pager: don't use unsafe functions in signal handlers > > > > Since the commit [a3da8821208d: pager: do wait_for_pager on signal > > death], we call wait_for_pager() in the pager's signal handler. The > > recent bug report revealed that this causes a deadlock in glibc at > > aborting "git log" [*1]. When this happens, git process is left > > unterminated, and it can't be killed by SIGTERM but only by SIGKILL. > > > > The problem is that wait_for_pager() function does more than waiting > > for pager process's termination, but it does cleanups and printing > > errors. Unfortunately, the functions that may be used in a signal > > handler are very limited [*2]. Particularly, malloc(), free() and the > > variants can't be used in a signal handler because they take a mutex > > internally in glibc. This was the cause of the deadlock above. Other > > than the direct calls of malloc/free, many functions calling > > malloc/free can't be used. strerror() is such one, either. > > I think this approach is the only real solution here (and I agree it is > a real-world problem). Unfortunately, it is the tip of the iceberg. > Looking at other signal handlers, there are lots of other potential > problems. For example, here are the first few I found by grepping: > > - clone.c:remove_junk uses strbufs; these are doing useful work, and > can't just be skipped if we are in a signal handler > > - fetch calls transport_unlock_pack, which has a free (which can be > skipped) > > - repack uses remove_temporary_files, which uses a strbuf > > and so on. Yes, we need to revise all signal handlers... > > Also the usage of fflush() and printf() in a signal handler is bad, > > although it seems working so far. In a safer side, we should avoid > > them, too. > > I'd be surprised if they are safe; stdio definitely involves locking. > > Perhaps we should reconsider whether f4c3edc (vreportf: avoid > intermediate buffer, 2015-08-11) is a good idea. Note that snprintf is > not on the list of safe functions, but I imagine that in practice it is > fine. Though just avoiding error()/warning() in signal handlers might be > a more practical solution anyway. > > > diff --git a/pager.c b/pager.c > > index 27d4c8a17aa1..12d17af73745 100644 > > --- a/pager.c > > +++ b/pager.c > > @@ -26,7 +26,10 @@ static void wait_for_pager(void) > > > > static void wait_for_pager_signal(int signo) > > { > > - wait_for_pager(); > > + /* signal EOF to pager */ > > + close(1); > > + close(2); > > + finish_command_in_signal(&pager_process); > > sigchain_pop(signo); > > raise(signo); > > } > > Hmm, is there is any reason to just pass an "in_signal" flag to > wait_for_pager(), to avoid duplicating the logic? Just because wait_for_pager() itself is an atexit hook that can't take an argument, so we'd need to split to a new function. I don't mind either way. The revised patch is below. thanks, Takashi -- 8< -- From: Takashi Iwai <tiwai@xxxxxxx> Subject: [PATCH v2] pager: don't use unsafe functions in signal handlers Since the commit [a3da8821208d: pager: do wait_for_pager on signal death], we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@xxxxxxx> --- pager.c | 22 ++++++++++++++++------ run-command.c | 25 +++++++++++++++++-------- run-command.h | 1 + 3 files changed, 34 insertions(+), 14 deletions(-) diff --git a/pager.c b/pager.c index 27d4c8a17aa1..e425070528f4 100644 --- a/pager.c +++ b/pager.c @@ -14,19 +14,29 @@ static const char *pager_argv[] = { NULL, NULL }; static struct child_process pager_process = CHILD_PROCESS_INIT; -static void wait_for_pager(void) +static void wait_for_pager(int in_signal) { - fflush(stdout); - fflush(stderr); + if (!in_signal) { + fflush(stdout); + fflush(stderr); + } /* signal EOF to pager */ close(1); close(2); - finish_command(&pager_process); + if (in_signal) + finish_command_in_signal(&pager_process); + else + finish_command(&pager_process); +} + +static void wait_for_pager_atexit(void) +{ + wait_for_pager(0); } static void wait_for_pager_signal(int signo) { - wait_for_pager(); + wait_for_pager(1); sigchain_pop(signo); raise(signo); } @@ -90,7 +100,7 @@ void setup_pager(void) /* this makes sure that the parent terminates after the pager */ sigchain_push_common(wait_for_pager_signal); - atexit(wait_for_pager); + atexit(wait_for_pager_atexit); } int pager_in_use(void) diff --git a/run-command.c b/run-command.c index 3277cf797ed4..e09275bd9e36 100644 --- a/run-command.c +++ b/run-command.c @@ -18,26 +18,27 @@ struct child_to_clean { static struct child_to_clean *children_to_clean; static int installed_child_cleanup_handler; -static void cleanup_children(int sig) +static void cleanup_children(int sig, int in_signal) { while (children_to_clean) { struct child_to_clean *p = children_to_clean; children_to_clean = p->next; kill(p->pid, sig); - free(p); + if (!in_signal) + free(p); } } static void cleanup_children_on_signal(int sig) { - cleanup_children(sig); + cleanup_children(sig, 1); sigchain_pop(sig); raise(sig); } static void cleanup_children_on_exit(void) { - cleanup_children(SIGTERM); + cleanup_children(SIGTERM, 0); } static void mark_child_for_cleanup(pid_t pid) @@ -220,7 +221,7 @@ static inline void set_cloexec(int fd) fcntl(fd, F_SETFD, flags | FD_CLOEXEC); } -static int wait_or_whine(pid_t pid, const char *argv0) +static int wait_or_whine(pid_t pid, const char *argv0, int in_signal) { int status, code = -1; pid_t waiting; @@ -228,6 +229,8 @@ static int wait_or_whine(pid_t pid, const char *argv0) while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR) ; /* nothing */ + if (in_signal) + return 0; if (waiting < 0) { failed_errno = errno; @@ -437,7 +440,7 @@ fail_pipe: * At this point we know that fork() succeeded, but execvp() * failed. Errors have been reported to our stderr. */ - wait_or_whine(cmd->pid, cmd->argv[0]); + wait_or_whine(cmd->pid, cmd->argv[0], 0); failed_errno = errno; cmd->pid = -1; } @@ -536,12 +539,18 @@ fail_pipe: int finish_command(struct child_process *cmd) { - int ret = wait_or_whine(cmd->pid, cmd->argv[0]); + int ret = wait_or_whine(cmd->pid, cmd->argv[0], 0); argv_array_clear(&cmd->args); argv_array_clear(&cmd->env_array); return ret; } +int finish_command_in_signal(struct child_process *cmd) +{ + return wait_or_whine(cmd->pid, cmd->argv[0], 1); +} + + int run_command(struct child_process *cmd) { int code; @@ -772,7 +781,7 @@ error: int finish_async(struct async *async) { #ifdef NO_PTHREADS - return wait_or_whine(async->pid, "child process"); + return wait_or_whine(async->pid, "child process", 0); #else void *ret = (void *)(intptr_t)(-1); diff --git a/run-command.h b/run-command.h index 5b4425a3cbe1..275d35c442ac 100644 --- a/run-command.h +++ b/run-command.h @@ -50,6 +50,7 @@ void child_process_init(struct child_process *); int start_command(struct child_process *); int finish_command(struct child_process *); +int finish_command_in_signal(struct child_process *); int run_command(struct child_process *); /* -- 2.5.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html