[sorry, had to resend - it was pointed out to me that when I sent this series the first time, DKIM got broken by the kvack list rewriting 8-bit into quoted-printable] At the moment, there is a lifetime issue (no, not the UAF kind) around __ptrace_may_access(): __ptrace_may_access() wants to check mm->flags and mm->user_ns to figure out whether the caller should be allowed to access some target task. __ptrace_may_access() can be called as long as __put_task_struct() hasn't happened yet; but __put_task_struct() happens when the task is about to be freed, which is much later than exit_mm() (which happens pretty early during task exit). So we can have a situation where we need to consult the mm for a security check, but we don't have an mm anymore. At the moment, this is solved by failing open: If the mm is gone, we pretend that it was dumpable. That's dubious from a security perspective - as one example, we drop the mm_struct before the file descriptor table, so someone might be able to steal file descriptors from an exiting tasks when dumpability was supposed to prevent that. The easy fix would be to let __ptrace_may_access() instead always refuse access to tasks that have lost their mm; but then that would e.g. mean that the ability to inspect dead tasks in procfs would be restricted. So while that might work in practice, it'd be a bit ugly, too. Another option would be to move the dumpability information elsewhere - but that would have to be the task_struct (the signal_struct can be shared with dead pre-execve threads, so we can't use it here). So we'd have to keep dumpability information in sync across threads - that'd probably be pretty ugly. So I think the proper fix is to let the task_struct hold a reference on the mm_struct until the task goes away completely. This is implemented in patch 1/6, which is also the only patch in this series that I actually care about (and the only one with a stable backport marking); the rest of the series are some tweaks in case people dislike the idea of constantly freeing mm_structs from workqueue context. Those tweaks should also reduce the memory usage of dead tasks, by ensuring that they don't keep their PGDs alive. Patch 1/6 is not particularly pretty, but I can't think of any better way to do it. So: Does this series (and in particular patch 1/6) look vaguely sane? And if not, does anyone have a better approach? Jann Horn (6): ptrace: Keep mm around after exit_mm() for __ptrace_may_access() refcount: Move refcount_t definition into linux/types.h mm: Add refcount for preserving mm_struct without pgd mm, oom: Use mm_ref()/mm_unref() and avoid mmdrop_async() ptrace: Use mm_ref() for ->exit_mm mm: remove now-unused mmdrop_async() arch/x86/kernel/tboot.c | 2 + drivers/firmware/efi/efi.c | 2 + include/linux/mm_types.h | 15 ++++++- include/linux/refcount.h | 13 +----- include/linux/sched.h | 8 ++++ include/linux/sched/mm.h | 13 ++++++ include/linux/types.h | 12 +++++ kernel/exit.c | 2 + kernel/fork.c | 90 +++++++++++++++++--------------------- kernel/ptrace.c | 10 +++++ mm/init-mm.c | 2 + mm/oom_kill.c | 2 +- 12 files changed, 105 insertions(+), 66 deletions(-) base-commit: bbf5c979011a099af5dc76498918ed7df445635b -- 2.29.0.rc1.297.gfa9743e501-goog