Dmitry Osipenko <digetx@xxxxxxxxx> writes: > 14.12.2021 01:53, Eric W. Biederman пишет: >> Simplify the code that allows SIGKILL during coredumps to terminate >> the coredump. As far as I can tell I have avoided breaking it >> by dumb luck. >> >> Historically with all of the other threads stopping in exit_mm the >> wants_signal loop in complete_signal would find the dumper task and >> then complete_signal would wake the dumper task with signal_wake_up. >> >> After moving the coredump_task_exit above the setting of PF_EXITING in >> commit 92307383082d ("coredump: Don't perform any cleanups before >> dumping core") wants_signal will consider all of the threads in a >> multi-threaded process for waking up, not just the core dumping task. >> >> Luckily complete_signal short circuits SIGKILL during a coredump marks >> every thread with SIGKILL and signal_wake_up. This code is arguably >> buggy however as it tries to skip creating a group exit when is already >> present, and it fails that a coredump is in progress. >> >> Ever since commit 06af8679449d ("coredump: Limit what can interrupt >> coredumps") was added dump_interrupted needs not just TIF_SIGPENDING >> set on the dumper task but also SIGKILL set in it's pending bitmap. >> This means that if the code is ever fixed not to short-circuit and >> kill a process after it has already been killed the special case >> for SIGKILL during a coredump will be broken. >> >> Sort all of this out by making the coredump special case more special, >> and perform all of the work in prepare_signal and leave the rest of >> the signal delivery path out of it. >> >> In prepare_signal when the process coredumping is sent SIGKILL find >> the task performing the coredump and use sigaddset and signal_wake_up >> to ensure that task reports fatal_signal_pending. >> >> Return false from prepare_signal to tell the rest of the signal >> delivery path to ignore the signal. >> >> Update wait_for_dump_helpers to perform a wait_event_killable wait >> so that if signal_pending gets set spuriously the wait will not >> be interrupted unless fatal_signal_pending is true. >> >> I have tested this and verified I did not break SIGKILL during >> coredumps by accident (before or after this change). I actually >> thought I had and I had to figure out what I had misread that kept >> SIGKILL during coredumps working. >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >> --- >> fs/coredump.c | 4 ++-- >> kernel/signal.c | 11 +++++++++-- >> 2 files changed, 11 insertions(+), 4 deletions(-) >> >> diff --git a/fs/coredump.c b/fs/coredump.c >> index a6b3c196cdef..7b91fb32dbb8 100644 >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -448,7 +448,7 @@ static void coredump_finish(bool core_dumped) >> static bool dump_interrupted(void) >> { >> /* >> - * SIGKILL or freezing() interrupt the coredumping. Perhaps we >> + * SIGKILL or freezing() interrupted the coredumping. Perhaps we >> * can do try_to_freeze() and check __fatal_signal_pending(), >> * but then we need to teach dump_write() to restart and clear >> * TIF_SIGPENDING. >> @@ -471,7 +471,7 @@ static void wait_for_dump_helpers(struct file *file) >> * We actually want wait_event_freezable() but then we need >> * to clear TIF_SIGPENDING and improve dump_interrupted(). >> */ >> - wait_event_interruptible(pipe->rd_wait, pipe->readers == 1); >> + wait_event_killable(pipe->rd_wait, pipe->readers == 1); >> >> pipe_lock(pipe); >> pipe->readers--; >> diff --git a/kernel/signal.c b/kernel/signal.c >> index 8272cac5f429..7e305a8ec7c2 100644 >> --- a/kernel/signal.c >> +++ b/kernel/signal.c >> @@ -907,8 +907,15 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) >> sigset_t flush; >> >> if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { >> - if (!(signal->flags & SIGNAL_GROUP_EXIT)) >> - return sig == SIGKILL; >> + struct core_state *core_state = signal->core_state; >> + if (core_state) { >> + if (sig == SIGKILL) { >> + struct task_struct *dumper = core_state->dumper.task; >> + sigaddset(&dumper->pending.signal, SIGKILL); >> + signal_wake_up(dumper, 1); >> + } >> + return false; >> + } >> /* >> * The process is in the middle of dying, nothing to do. >> */ >> > > Hi, > > This patch breaks userspace, in particular it breaks gst-plugin-scanner > of GStreamer which hangs now on next-20211224. IIUC, this tool builds a > registry of good/working GStreamer plugins by loading them and > blacklisting those that don't work (crash). Before the hang I see > systemd-coredump process running, taking snapshot of gst-plugin-scanner > and then gst-plugin-scanner gets stuck. > > Bisection points at this patch, reverting it restores > gst-plugin-scanner. Systemd-coredump still running, but there is no hang > anymore and everything works properly as before. > > I'm seeing this problem on ARM32 and haven't checked other arches. > Please fix, thanks in advance. I have not yet been able to figure out how to run gst-pluggin-scanner in a way that triggers this yet. In truth I can't figure out how to run gst-pluggin-scanner in a useful way. I am going to set up some unit tests and see if I can reproduce your hang another way, but if you could give me some more information on what you are doing to trigger this I would appreciate it. Eric