Re: glibc mutex deadlock in signal handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 04 Sep 2015 11:23:55 +0200,
Jeff King wrote:
> 
> On Fri, Sep 04, 2015 at 07:52:21AM +0200, Takashi Iwai wrote:
> 
> > -- 8< --
> > From: Takashi Iwai <tiwai@xxxxxxx>
> > Subject: [PATCH] pager: don't use unsafe functions in signal handlers
> > 
> > Since the commit [a3da8821208d: pager: do wait_for_pager on signal
> > death], we call wait_for_pager() in the pager's signal handler.  The
> > recent bug report revealed that this causes a deadlock in glibc at
> > aborting "git log" [*1].  When this happens, git process is left
> > unterminated, and it can't be killed by SIGTERM but only by SIGKILL.
> > 
> > The problem is that wait_for_pager() function does more than waiting
> > for pager process's termination, but it does cleanups and printing
> > errors.  Unfortunately, the functions that may be used in a signal
> > handler are very limited [*2].  Particularly, malloc(), free() and the
> > variants can't be used in a signal handler because they take a mutex
> > internally in glibc.  This was the cause of the deadlock above.  Other
> > than the direct calls of malloc/free, many functions calling
> > malloc/free can't be used.  strerror() is such one, either.
> 
> I think this approach is the only real solution here (and I agree it is
> a real-world problem). Unfortunately, it is the tip of the iceberg.
> Looking at other signal handlers, there are lots of other potential
> problems. For example, here are the first few I found by grepping:
> 
>   - clone.c:remove_junk uses strbufs; these are doing useful work, and
>     can't just be skipped if we are in a signal handler
> 
>   - fetch calls transport_unlock_pack, which has a free (which can be
>     skipped)
> 
>   - repack uses remove_temporary_files, which uses a strbuf
> 
> and so on.

Yes, we need to revise all signal handlers...

> > Also the usage of fflush() and printf() in a signal handler is bad,
> > although it seems working so far.  In a safer side, we should avoid
> > them, too.
> 
> I'd be surprised if they are safe; stdio definitely involves locking.
> 
> Perhaps we should reconsider whether f4c3edc (vreportf: avoid
> intermediate buffer, 2015-08-11) is a good idea.  Note that snprintf is
> not on the list of safe functions, but I imagine that in practice it is
> fine. Though just avoiding error()/warning() in signal handlers might be
> a more practical solution anyway.
> 
> > diff --git a/pager.c b/pager.c
> > index 27d4c8a17aa1..12d17af73745 100644
> > --- a/pager.c
> > +++ b/pager.c
> > @@ -26,7 +26,10 @@ static void wait_for_pager(void)
> >  
> >  static void wait_for_pager_signal(int signo)
> >  {
> > -	wait_for_pager();
> > +	/* signal EOF to pager */
> > +	close(1);
> > +	close(2);
> > +	finish_command_in_signal(&pager_process);
> >  	sigchain_pop(signo);
> >  	raise(signo);
> >  }
> 
> Hmm, is there is any reason to just pass an "in_signal" flag to
> wait_for_pager(), to avoid duplicating the logic?

Just because wait_for_pager() itself is an atexit hook that can't take
an argument, so we'd need to split to a new function.  I don't mind
either way.  The revised patch is below.


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@xxxxxxx>
Subject: [PATCH v2] pager: don't use unsafe functions in signal handlers

Since the commit [a3da8821208d: pager: do wait_for_pager on signal
death], we call wait_for_pager() in the pager's signal handler.  The
recent bug report revealed that this causes a deadlock in glibc at
aborting "git log" [*1].  When this happens, git process is left
unterminated, and it can't be killed by SIGTERM but only by SIGKILL.

The problem is that wait_for_pager() function does more than waiting
for pager process's termination, but it does cleanups and printing
errors.  Unfortunately, the functions that may be used in a signal
handler are very limited [*2].  Particularly, malloc(), free() and the
variants can't be used in a signal handler because they take a mutex
internally in glibc.  This was the cause of the deadlock above.  Other
than the direct calls of malloc/free, many functions calling
malloc/free can't be used.  strerror() is such one, either.

Also the usage of fflush() and printf() in a signal handler is bad,
although it seems working so far.  In a safer side, we should avoid
them, too.

This patch tries to reduce the calls of such functions in signal
handlers.  wait_for_signal() takes a flag and avoids the unsafe
calls.   Also, finish_command_in_signal() is introduced for the
same reason.  There the free() calls are removed, and only waits for
the children without whining at errors.

[*1] https://bugzilla.opensuse.org/show_bug.cgi?id=942297
[*2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03

Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
---
 pager.c       | 22 ++++++++++++++++------
 run-command.c | 25 +++++++++++++++++--------
 run-command.h |  1 +
 3 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/pager.c b/pager.c
index 27d4c8a17aa1..e425070528f4 100644
--- a/pager.c
+++ b/pager.c
@@ -14,19 +14,29 @@
 static const char *pager_argv[] = { NULL, NULL };
 static struct child_process pager_process = CHILD_PROCESS_INIT;
 
-static void wait_for_pager(void)
+static void wait_for_pager(int in_signal)
 {
-	fflush(stdout);
-	fflush(stderr);
+	if (!in_signal) {
+		fflush(stdout);
+		fflush(stderr);
+	}
 	/* signal EOF to pager */
 	close(1);
 	close(2);
-	finish_command(&pager_process);
+	if (in_signal)
+		finish_command_in_signal(&pager_process);
+	else
+		finish_command(&pager_process);
+}
+
+static void wait_for_pager_atexit(void)
+{
+	wait_for_pager(0);
 }
 
 static void wait_for_pager_signal(int signo)
 {
-	wait_for_pager();
+	wait_for_pager(1);
 	sigchain_pop(signo);
 	raise(signo);
 }
@@ -90,7 +100,7 @@ void setup_pager(void)
 
 	/* this makes sure that the parent terminates after the pager */
 	sigchain_push_common(wait_for_pager_signal);
-	atexit(wait_for_pager);
+	atexit(wait_for_pager_atexit);
 }
 
 int pager_in_use(void)
diff --git a/run-command.c b/run-command.c
index 3277cf797ed4..e09275bd9e36 100644
--- a/run-command.c
+++ b/run-command.c
@@ -18,26 +18,27 @@ struct child_to_clean {
 static struct child_to_clean *children_to_clean;
 static int installed_child_cleanup_handler;
 
-static void cleanup_children(int sig)
+static void cleanup_children(int sig, int in_signal)
 {
 	while (children_to_clean) {
 		struct child_to_clean *p = children_to_clean;
 		children_to_clean = p->next;
 		kill(p->pid, sig);
-		free(p);
+		if (!in_signal)
+			free(p);
 	}
 }
 
 static void cleanup_children_on_signal(int sig)
 {
-	cleanup_children(sig);
+	cleanup_children(sig, 1);
 	sigchain_pop(sig);
 	raise(sig);
 }
 
 static void cleanup_children_on_exit(void)
 {
-	cleanup_children(SIGTERM);
+	cleanup_children(SIGTERM, 0);
 }
 
 static void mark_child_for_cleanup(pid_t pid)
@@ -220,7 +221,7 @@ static inline void set_cloexec(int fd)
 		fcntl(fd, F_SETFD, flags | FD_CLOEXEC);
 }
 
-static int wait_or_whine(pid_t pid, const char *argv0)
+static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 {
 	int status, code = -1;
 	pid_t waiting;
@@ -228,6 +229,8 @@ static int wait_or_whine(pid_t pid, const char *argv0)
 
 	while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
 		;	/* nothing */
+	if (in_signal)
+		return 0;
 
 	if (waiting < 0) {
 		failed_errno = errno;
@@ -437,7 +440,7 @@ fail_pipe:
 		 * At this point we know that fork() succeeded, but execvp()
 		 * failed. Errors have been reported to our stderr.
 		 */
-		wait_or_whine(cmd->pid, cmd->argv[0]);
+		wait_or_whine(cmd->pid, cmd->argv[0], 0);
 		failed_errno = errno;
 		cmd->pid = -1;
 	}
@@ -536,12 +539,18 @@ fail_pipe:
 
 int finish_command(struct child_process *cmd)
 {
-	int ret = wait_or_whine(cmd->pid, cmd->argv[0]);
+	int ret = wait_or_whine(cmd->pid, cmd->argv[0], 0);
 	argv_array_clear(&cmd->args);
 	argv_array_clear(&cmd->env_array);
 	return ret;
 }
 
+int finish_command_in_signal(struct child_process *cmd)
+{
+	return wait_or_whine(cmd->pid, cmd->argv[0], 1);
+}
+
+
 int run_command(struct child_process *cmd)
 {
 	int code;
@@ -772,7 +781,7 @@ error:
 int finish_async(struct async *async)
 {
 #ifdef NO_PTHREADS
-	return wait_or_whine(async->pid, "child process");
+	return wait_or_whine(async->pid, "child process", 0);
 #else
 	void *ret = (void *)(intptr_t)(-1);
 
diff --git a/run-command.h b/run-command.h
index 5b4425a3cbe1..275d35c442ac 100644
--- a/run-command.h
+++ b/run-command.h
@@ -50,6 +50,7 @@ void child_process_init(struct child_process *);
 
 int start_command(struct child_process *);
 int finish_command(struct child_process *);
+int finish_command_in_signal(struct child_process *);
 int run_command(struct child_process *);
 
 /*
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]