Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case

Dmitry Osipenko <digetx@xxxxxxxxx> · Tue, 4 Jan 2022 09:30:45 +0300

14.12.2021 01:53, Eric W. Biederman пишет:
> Simplify the code that allows SIGKILL during coredumps to terminate
> the coredump.  As far as I can tell I have avoided breaking it
> by dumb luck.
> 
> Historically with all of the other threads stopping in exit_mm the
> wants_signal loop in complete_signal would find the dumper task and
> then complete_signal would wake the dumper task with signal_wake_up.
> 
> After moving the coredump_task_exit above the setting of PF_EXITING in
> commit 92307383082d ("coredump: Don't perform any cleanups before
> dumping core") wants_signal will consider all of the threads in a
> multi-threaded process for waking up, not just the core dumping task.
> 
> Luckily complete_signal short circuits SIGKILL during a coredump marks
> every thread with SIGKILL and signal_wake_up.  This code is arguably
> buggy however as it tries to skip creating a group exit when is already
> present, and it fails that a coredump is in progress.
> 
> Ever since commit 06af8679449d ("coredump: Limit what can interrupt
> coredumps") was added dump_interrupted needs not just TIF_SIGPENDING
> set on the dumper task but also SIGKILL set in it's pending bitmap.
> This means that if the code is ever fixed not to short-circuit and
> kill a process after it has already been killed the special case
> for SIGKILL during a coredump will be broken.
> 
> Sort all of this out by making the coredump special case more special,
> and perform all of the work in prepare_signal and leave the rest of
> the signal delivery path out of it.
> 
> In prepare_signal when the process coredumping is sent SIGKILL find
> the task performing the coredump and use sigaddset and signal_wake_up
> to ensure that task reports fatal_signal_pending.
> 
> Return false from prepare_signal to tell the rest of the signal
> delivery path to ignore the signal.
> 
> Update wait_for_dump_helpers to perform a wait_event_killable wait
> so that if signal_pending gets set spuriously the wait will not
> be interrupted unless fatal_signal_pending is true.
> 
> I have tested this and verified I did not break SIGKILL during
> coredumps by accident (before or after this change).  I actually
> thought I had and I had to figure out what I had misread that kept
> SIGKILL during coredumps working.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> ---
>  fs/coredump.c   |  4 ++--
>  kernel/signal.c | 11 +++++++++--
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index a6b3c196cdef..7b91fb32dbb8 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -448,7 +448,7 @@ static void coredump_finish(bool core_dumped)
>  static bool dump_interrupted(void)
>  {
>  	/*
> -	 * SIGKILL or freezing() interrupt the coredumping. Perhaps we
> +	 * SIGKILL or freezing() interrupted the coredumping. Perhaps we
>  	 * can do try_to_freeze() and check __fatal_signal_pending(),
>  	 * but then we need to teach dump_write() to restart and clear
>  	 * TIF_SIGPENDING.
> @@ -471,7 +471,7 @@ static void wait_for_dump_helpers(struct file *file)
>  	 * We actually want wait_event_freezable() but then we need
>  	 * to clear TIF_SIGPENDING and improve dump_interrupted().
>  	 */
> -	wait_event_interruptible(pipe->rd_wait, pipe->readers == 1);
> +	wait_event_killable(pipe->rd_wait, pipe->readers == 1);
>  
>  	pipe_lock(pipe);
>  	pipe->readers--;
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 8272cac5f429..7e305a8ec7c2 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -907,8 +907,15 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
>  	sigset_t flush;
>  
>  	if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
> -		if (!(signal->flags & SIGNAL_GROUP_EXIT))
> -			return sig == SIGKILL;
> +		struct core_state *core_state = signal->core_state;
> +		if (core_state) {
> +			if (sig == SIGKILL) {
> +				struct task_struct *dumper = core_state->dumper.task;
> +				sigaddset(&dumper->pending.signal, SIGKILL);
> +				signal_wake_up(dumper, 1);
> +			}
> +			return false;
> +		}
>  		/*
>  		 * The process is in the middle of dying, nothing to do.
>  		 */
> 

Hi,

This patch breaks userspace, in particular it breaks gst-plugin-scanner
of GStreamer which hangs now on next-20211224. IIUC, this tool builds a
registry of good/working GStreamer plugins by loading them and
blacklisting those that don't work (crash). Before the hang I see
systemd-coredump process running, taking snapshot of gst-plugin-scanner
and then gst-plugin-scanner gets stuck.

Bisection points at this patch, reverting it restores
gst-plugin-scanner. Systemd-coredump still running, but there is no hang
anymore and everything works properly as before.

I'm seeing this problem on ARM32 and haven't checked other arches.
Please fix, thanks in advance.