On Sun, Mar 01, 2020 at 07:21:03PM +0100, Jann Horn wrote: > On Sun, Mar 1, 2020 at 12:27 PM Bernd Edlinger > <bernd.edlinger@xxxxxxxxxx> wrote: > > The proposed solution is to have a second mutex that is > > used in mm_access, so it is allowed to continue while the > > dying threads are not yet terminated. > > Just for context: When I proposed something similar back in 2016, > https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@xxxxxxxxxx/ > was the resulting discussion thread. At least back then, I looked > through the various existing users of cred_guard_mutex, and the only > places that couldn't be converted to the new second mutex were > PTRACE_ATTACH and SECCOMP_FILTER_FLAG_TSYNC. > > > The ideal solution would IMO be something like this: Decide what the > new task's credentials should be *before* reaching de_thread(), > install them into a second cred* on the task (together with the new > dumpability), drop the cred_guard_mutex, and let ptrace_may_access() > check against both. After that, some further restructuring might even Hm, so essentially a private ptrace_access_cred member in task_struct? That would presumably also involve altering various LSM hooks to look at ptrace_access_cred. (Minor side-note, de_thread() takes a struct task_struct argument but only ever is passed current.) > allow the cred_guard_mutex to not be held across all of the VFS > operations that happen early on in execve, which may block > indefinitely. But that would be pretty complicated, so I think your > proposed solution makes sense for now, given that nobody has managed > to implement anything better in the last few years. Reading through the old threads and how often this issue came up, I tend to agree.