I dug into exit because PTRACE_EVENT_EXIT not being guaranteed to be called with a stack where ptrace read and write all of the userspace registers can lead to unfiltered reads and writes of kernel stack contents. While looking into it I realized that there are a lot of little races between all of the ways an exit can be initiated. I don't know of a way those races are harmful, but they make the code difficult to reason about. The solution this set of changes adopts is to implement good primitives for asynchronous exit and exit_group requests and modifies exit(2) and exit_group(2) to use those primitives. The result should be more consistent determination of the reason for an exit, as well as PTRACE_EVENT_EXIT always being called from a context (get_signal) where ptrace is guaranteed to be able to read and write all of the registers. I believe the set of changes could be justified for the cleanups alone even if PTRACE_EVENT_EXIT did not need to be moved. Which makes me feel good about this approach. If a way can be found that coredumps can be started from complete_signal (needed for timely handling of fatal signals) instead of needing to start in do_coredump for proper synchronization force_siginfo_to_task and get_signal can be significantly simplified. As it is a lot of checks are duplicated to ensure that everything works properly in the presence of do_coredump. So far the code has been lightly tested, and the descriptions of some of the patches are a bit light, but I think this shows the direction I am aiming to travel for sorting out exit(2) and exit_group(2). Eric W. Biederman (9): signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) signal/seccomp: Refactor seccomp signal and coredump generation signal/seccomp: Dump core when there is only one live thread signal: Factor start_group_exit out of complete_signal signal/group_exit: Use start_group_exit in place of do_group_exit signal: Fold do_group_exit into get_signal fixing io_uring threads signal: Make individual tasks exiting a first class concept. signal/task_exit: Use start_task_exit in place of do_exit signal: Move PTRACE_EVENT_EXIT into get_signal arch/sh/kernel/cpu/fpu.c | 10 +-- fs/exec.c | 10 ++- include/linux/sched/jobctl.h | 2 + include/linux/sched/signal.h | 5 ++ include/linux/sched/task.h | 1 - kernel/exit.c | 41 ++--------- kernel/seccomp.c | 45 +++--------- kernel/signal.c | 166 ++++++++++++++++++++++++++++++------------- 8 files changed, 154 insertions(+), 126 deletions(-)