05.01.2022 22:58, Eric W. Biederman пишет: > Dmitry Osipenko <digetx@xxxxxxxxx> writes: > >> 14.12.2021 01:53, Eric W. Biederman пишет: >>> Simplify the code that allows SIGKILL during coredumps to terminate >>> the coredump. As far as I can tell I have avoided breaking it >>> by dumb luck. >>> >>> Historically with all of the other threads stopping in exit_mm the >>> wants_signal loop in complete_signal would find the dumper task and >>> then complete_signal would wake the dumper task with signal_wake_up. >>> >>> After moving the coredump_task_exit above the setting of PF_EXITING in >>> commit 92307383082d ("coredump: Don't perform any cleanups before >>> dumping core") wants_signal will consider all of the threads in a >>> multi-threaded process for waking up, not just the core dumping task. >>> >>> Luckily complete_signal short circuits SIGKILL during a coredump marks >>> every thread with SIGKILL and signal_wake_up. This code is arguably >>> buggy however as it tries to skip creating a group exit when is already >>> present, and it fails that a coredump is in progress. >>> >>> Ever since commit 06af8679449d ("coredump: Limit what can interrupt >>> coredumps") was added dump_interrupted needs not just TIF_SIGPENDING >>> set on the dumper task but also SIGKILL set in it's pending bitmap. >>> This means that if the code is ever fixed not to short-circuit and >>> kill a process after it has already been killed the special case >>> for SIGKILL during a coredump will be broken. >>> >>> Sort all of this out by making the coredump special case more special, >>> and perform all of the work in prepare_signal and leave the rest of >>> the signal delivery path out of it. >>> >>> In prepare_signal when the process coredumping is sent SIGKILL find >>> the task performing the coredump and use sigaddset and signal_wake_up >>> to ensure that task reports fatal_signal_pending. >>> >>> Return false from prepare_signal to tell the rest of the signal >>> delivery path to ignore the signal. >>> >>> Update wait_for_dump_helpers to perform a wait_event_killable wait >>> so that if signal_pending gets set spuriously the wait will not >>> be interrupted unless fatal_signal_pending is true. >>> >>> I have tested this and verified I did not break SIGKILL during >>> coredumps by accident (before or after this change). I actually >>> thought I had and I had to figure out what I had misread that kept >>> SIGKILL during coredumps working. >>> >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>> --- >>> fs/coredump.c | 4 ++-- >>> kernel/signal.c | 11 +++++++++-- >>> 2 files changed, 11 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/coredump.c b/fs/coredump.c >>> index a6b3c196cdef..7b91fb32dbb8 100644 >>> --- a/fs/coredump.c >>> +++ b/fs/coredump.c >>> @@ -448,7 +448,7 @@ static void coredump_finish(bool core_dumped) >>> static bool dump_interrupted(void) >>> { >>> /* >>> - * SIGKILL or freezing() interrupt the coredumping. Perhaps we >>> + * SIGKILL or freezing() interrupted the coredumping. Perhaps we >>> * can do try_to_freeze() and check __fatal_signal_pending(), >>> * but then we need to teach dump_write() to restart and clear >>> * TIF_SIGPENDING. >>> @@ -471,7 +471,7 @@ static void wait_for_dump_helpers(struct file *file) >>> * We actually want wait_event_freezable() but then we need >>> * to clear TIF_SIGPENDING and improve dump_interrupted(). >>> */ >>> - wait_event_interruptible(pipe->rd_wait, pipe->readers == 1); >>> + wait_event_killable(pipe->rd_wait, pipe->readers == 1); >>> >>> pipe_lock(pipe); >>> pipe->readers--; >>> diff --git a/kernel/signal.c b/kernel/signal.c >>> index 8272cac5f429..7e305a8ec7c2 100644 >>> --- a/kernel/signal.c >>> +++ b/kernel/signal.c >>> @@ -907,8 +907,15 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) >>> sigset_t flush; >>> >>> if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { >>> - if (!(signal->flags & SIGNAL_GROUP_EXIT)) >>> - return sig == SIGKILL; >>> + struct core_state *core_state = signal->core_state; >>> + if (core_state) { >>> + if (sig == SIGKILL) { >>> + struct task_struct *dumper = core_state->dumper.task; >>> + sigaddset(&dumper->pending.signal, SIGKILL); >>> + signal_wake_up(dumper, 1); >>> + } >>> + return false; >>> + } >>> /* >>> * The process is in the middle of dying, nothing to do. >>> */ >>> >> >> Hi, >> >> This patch breaks userspace, in particular it breaks gst-plugin-scanner >> of GStreamer which hangs now on next-20211224. IIUC, this tool builds a >> registry of good/working GStreamer plugins by loading them and >> blacklisting those that don't work (crash). Before the hang I see >> systemd-coredump process running, taking snapshot of gst-plugin-scanner >> and then gst-plugin-scanner gets stuck. >> >> Bisection points at this patch, reverting it restores >> gst-plugin-scanner. Systemd-coredump still running, but there is no hang >> anymore and everything works properly as before. >> >> I'm seeing this problem on ARM32 and haven't checked other arches. >> Please fix, thanks in advance. > > > I have not yet been able to figure out how to run gst-pluggin-scanner in > a way that triggers this yet. In truth I can't figure out how to > run gst-pluggin-scanner in a useful way. > > I am going to set up some unit tests and see if I can reproduce your > hang another way, but if you could give me some more information on what > you are doing to trigger this I would appreciate it. Thanks, Eric. The distro is Arch Linux, but it's a development environment where I'm running latest GStreamer from git master. I'll try to figure out the reproduction steps and get back to you.