On 6/10/21 11:39 PM, akpm@xxxxxxxxxxxxxxxxxxxx wrote: > > The patch titled > Subject: exec: fix deadlock in de_thread with ptrace_attach > has been added to the -mm tree. Its filename is > exec-fix-dead-lock-in-de_thread-with-ptrace_attach.patch > > This patch should soon appear at > https://ozlabs.org/~akpm/mmots/broken-out/exec-fix-dead-lock-in-de_thread-with-ptrace_attach.patch > and later at > https://ozlabs.org/~akpm/mmotm/broken-out/exec-fix-dead-lock-in-de_thread-with-ptrace_attach.patch > > Before you just go and hit "reply", please: > a) Consider who else should be cc'ed > b) Prefer to cc a suitable mailing list as well > c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > Thanks! Maybe You should also consider those two patches in the same function: [PATCH] Fix error handling in begin_new_exec https://lore.kernel.org/linux-fsdevel/87mts2kcrm.fsf@disp2133/T/#t [PATCH] exec: Copy oldsighand->action under spin-lock https://lore.kernel.org/linux-fsdevel/202106071617.5713E0A01@keescook/T/#t Bernd. > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > The -mm tree is included into linux-next and is updated > there every 3-4 working days > > ------------------------------------------------------ > From: Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> > Subject: exec: fix deadlock in de_thread with ptrace_attach > > This introduces signal->unsafe_execve_in_progress, which is used to fix > the case when at least one of the sibling threads is traced, and therefore > the trace process may deadlock in ptrace_attach, but de_thread will need > to wait for the tracer to continue execution. > > The solution is to detect this situation and allow ptrace_attach to > continue, while de_thread() is still waiting for traced zombies to be > eventually released. When the current thread changed the ptrace status > from non-traced to traced, we can simply abort the whole execve and > restart it by returning -ERESTARTSYS. This needs to be done before > changing the thread leader, because the PTRACE_EVENT_EXEC needs to know > the old thread pid. > > Although it is technically after the point of no return, we just have to > reset bprm->point_of_no_return here, since at this time only the other > threads have received a fatal signal, not the current thread. > > The user's point of view it that the whole execve was simply delayed until > after the ptrace_attach. > > Other threads die quickly since the cred_guard_mutex is released, but a > deadly signal is already pending. In case the mutex_lock_killable misses > the signal, ->unsafe_execve_in_progress makes sure they release the mutex > immediately and return with -ERESTARTNOINTR. > > This means there is no API change, unlike the previous version of this > patch which was discussed here: > > https://lore.kernel.org/lkml/b6537ae6-31b1-5c50-f32b-8b8332ace882@xxxxxxxxxx/ > > See tools/testing/selftests/ptrace/vmaccess.c for a test case that gets > fixed by this change. > > Note that since the test case was originally designed to test the > ptrace_attach returning an error in this situation, the test expectation > needed to be adjusted, to allow the API to succeed at the first attempt. > > Link: https://lkml.kernel.org/r/Message-ID: > Signed-off-by: Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> > Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> > Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx> > Cc: Oleg Nesterov <oleg@xxxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> > Cc: Will Drewry <wad@xxxxxxxxxxxx> > Cc: Shuah Khan <shuah@xxxxxxxxxx> > Cc: Christian Brauner <christian.brauner@xxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Serge Hallyn <serge@xxxxxxxxxx> > Cc: James Morris <jamorris@xxxxxxxxxxxxxxxxxxx> > Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > Cc: Charles Haithcock <chaithco@xxxxxxxxxx> > Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx> > Cc: Yafang Shao <laoar.shao@xxxxxxxxx> > Cc: Helge Deller <deller@xxxxxx> > Cc: YiFei Zhu <yifeifz2@xxxxxxxxxxxx> > Cc: Adrian Reber <areber@xxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > fs/exec.c | 45 ++++++++++++++++---- > fs/proc/base.c | 6 ++ > include/linux/sched/signal.h | 13 +++++ > kernel/ptrace.c | 9 ++++ > kernel/seccomp.c | 12 ++++- > tools/testing/selftests/ptrace/vmaccess.c | 25 +++++++---- > 6 files changed, 92 insertions(+), 18 deletions(-) > > --- a/fs/exec.c~exec-fix-dead-lock-in-de_thread-with-ptrace_attach > +++ a/fs/exec.c > @@ -1037,6 +1037,8 @@ static int de_thread(struct task_struct > struct signal_struct *sig = tsk->signal; > struct sighand_struct *oldsighand = tsk->sighand; > spinlock_t *lock = &oldsighand->siglock; > + unsigned int prev_ptrace = tsk->ptrace; > + struct task_struct *t = tsk; > > if (thread_group_empty(tsk)) > goto no_thread_group; > @@ -1054,20 +1056,40 @@ static int de_thread(struct task_struct > return -EAGAIN; > } > > + while_each_thread(tsk, t) { > + if (unlikely(t->ptrace) && t != tsk->group_leader) > + sig->unsafe_execve_in_progress = true; > + } > + > sig->group_exit_task = tsk; > sig->notify_count = zap_other_threads(tsk); > if (!thread_group_leader(tsk)) > sig->notify_count--; > + spin_unlock_irq(lock); > > - while (sig->notify_count) { > - __set_current_state(TASK_KILLABLE); > - spin_unlock_irq(lock); > + if (unlikely(sig->unsafe_execve_in_progress)) > + mutex_unlock(&sig->cred_guard_mutex); > + > + for (;;) { > + set_current_state(TASK_KILLABLE); > + if (!sig->notify_count) > + break; > schedule(); > if (__fatal_signal_pending(tsk)) > goto killed; > - spin_lock_irq(lock); > } > - spin_unlock_irq(lock); > + __set_current_state(TASK_RUNNING); > + > + if (unlikely(sig->unsafe_execve_in_progress)) { > + if (mutex_lock_killable(&sig->cred_guard_mutex)) > + goto killed; > + sig->unsafe_execve_in_progress = false; > + if (!prev_ptrace && tsk->ptrace) { > + sig->group_exit_task = NULL; > + sig->notify_count = 0; > + return -ERESTARTSYS; > + } > + } > > /* > * At this point all other threads have exited, all we have to > @@ -1252,8 +1274,11 @@ int begin_new_exec(struct linux_binprm * > * Make this the only thread in the thread group. > */ > retval = de_thread(me); > - if (retval) > + if (retval) { > + if (retval == -ERESTARTSYS) > + bprm->point_of_no_return = false; > goto out; > + } > > /* > * Cancel any io_uring activity across execve > @@ -1460,6 +1485,11 @@ static int prepare_bprm_creds(struct lin > if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex)) > return -ERESTARTNOINTR; > > + if (unlikely(current->signal->unsafe_execve_in_progress)) { > + mutex_unlock(¤t->signal->cred_guard_mutex); > + return -ERESTARTNOINTR; > + } > + > bprm->cred = prepare_exec_creds(); > if (likely(bprm->cred)) > return 0; > @@ -1476,7 +1506,8 @@ static void free_bprm(struct linux_binpr > } > free_arg_pages(bprm); > if (bprm->cred) { > - mutex_unlock(¤t->signal->cred_guard_mutex); > + if (!current->signal->unsafe_execve_in_progress) > + mutex_unlock(¤t->signal->cred_guard_mutex); > abort_creds(bprm->cred); > } > if (bprm->file) { > --- a/fs/proc/base.c~exec-fix-dead-lock-in-de_thread-with-ptrace_attach > +++ a/fs/proc/base.c > @@ -2748,6 +2748,12 @@ static ssize_t proc_pid_attr_write(struc > if (rv < 0) > goto out_free; > > + if (unlikely(current->signal->unsafe_execve_in_progress)) { > + mutex_unlock(¤t->signal->cred_guard_mutex); > + rv = -ERESTARTNOINTR; > + goto out_free; > + } > + > rv = security_setprocattr(PROC_I(inode)->op.lsm, > file->f_path.dentry->d_name.name, page, > count); > --- a/include/linux/sched/signal.h~exec-fix-dead-lock-in-de_thread-with-ptrace_attach > +++ a/include/linux/sched/signal.h > @@ -214,6 +214,17 @@ struct signal_struct { > #endif > > /* > + * Set while execve is executing but is *not* holding > + * cred_guard_mutex to avoid possible dead-locks. > + * The cred_guard_mutex is released *after* de_thread() has > + * called zap_other_threads(), therefore a fatal signal is > + * guaranteed to be already pending in the unlikely event, that > + * current->signal->unsafe_execve_in_progress happens to be > + * true after the cred_guard_mutex was acquired. > + */ > + bool unsafe_execve_in_progress; > + > + /* > * Thread is the potential origin of an oom condition; kill first on > * oom > */ > @@ -227,6 +238,8 @@ struct signal_struct { > struct mutex cred_guard_mutex; /* guard against foreign influences on > * credential calculations > * (notably. ptrace) > + * Held while execve runs, except when > + * a sibling thread is being traced. > * Deprecated do not use in new code. > * Use exec_update_lock instead. > */ > --- a/kernel/ptrace.c~exec-fix-dead-lock-in-de_thread-with-ptrace_attach > +++ a/kernel/ptrace.c > @@ -485,6 +485,14 @@ static int ptrace_traceme(void) > { > int ret = -EPERM; > > + if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex)) > + return -ERESTARTNOINTR; > + > + if (unlikely(current->signal->unsafe_execve_in_progress)) { > + mutex_unlock(¤t->signal->cred_guard_mutex); > + return -ERESTARTNOINTR; > + } > + > write_lock_irq(&tasklist_lock); > /* Are we already being traced? */ > if (!current->ptrace) { > @@ -500,6 +508,7 @@ static int ptrace_traceme(void) > } > } > write_unlock_irq(&tasklist_lock); > + mutex_unlock(¤t->signal->cred_guard_mutex); > > return ret; > } > --- a/kernel/seccomp.c~exec-fix-dead-lock-in-de_thread-with-ptrace_attach > +++ a/kernel/seccomp.c > @@ -1833,9 +1833,15 @@ static long seccomp_set_mode_filter(unsi > * Make sure we cannot change seccomp or nnp state via TSYNC > * while another thread is in the middle of calling exec. > */ > - if (flags & SECCOMP_FILTER_FLAG_TSYNC && > - mutex_lock_killable(¤t->signal->cred_guard_mutex)) > - goto out_put_fd; > + if (flags & SECCOMP_FILTER_FLAG_TSYNC) { > + if (mutex_lock_killable(¤t->signal->cred_guard_mutex)) > + goto out_put_fd; > + > + if (unlikely(current->signal->unsafe_execve_in_progress)) { > + mutex_unlock(¤t->signal->cred_guard_mutex); > + goto out_put_fd; > + } > + } > > spin_lock_irq(¤t->sighand->siglock); > > --- a/tools/testing/selftests/ptrace/vmaccess.c~exec-fix-dead-lock-in-de_thread-with-ptrace_attach > +++ a/tools/testing/selftests/ptrace/vmaccess.c > @@ -39,8 +39,15 @@ TEST(vmaccess) > f = open(mm, O_RDONLY); > ASSERT_GE(f, 0); > close(f); > - f = kill(pid, SIGCONT); > - ASSERT_EQ(f, 0); > + f = waitpid(-1, NULL, 0); > + ASSERT_NE(f, -1); > + ASSERT_NE(f, 0); > + ASSERT_NE(f, pid); > + f = waitpid(-1, NULL, 0); > + ASSERT_EQ(f, pid); > + f = waitpid(-1, NULL, 0); > + ASSERT_EQ(f, -1); > + ASSERT_EQ(errno, ECHILD); > } > > TEST(attach) > @@ -57,22 +64,24 @@ TEST(attach) > > sleep(1); > k = ptrace(PTRACE_ATTACH, pid, 0L, 0L); > - ASSERT_EQ(errno, EAGAIN); > - ASSERT_EQ(k, -1); > + ASSERT_EQ(k, 0); > k = waitpid(-1, &s, WNOHANG); > ASSERT_NE(k, -1); > ASSERT_NE(k, 0); > ASSERT_NE(k, pid); > ASSERT_EQ(WIFEXITED(s), 1); > ASSERT_EQ(WEXITSTATUS(s), 0); > - sleep(1); > - k = ptrace(PTRACE_ATTACH, pid, 0L, 0L); > - ASSERT_EQ(k, 0); > k = waitpid(-1, &s, 0); > ASSERT_EQ(k, pid); > ASSERT_EQ(WIFSTOPPED(s), 1); > ASSERT_EQ(WSTOPSIG(s), SIGSTOP); > - k = ptrace(PTRACE_DETACH, pid, 0L, 0L); > + k = ptrace(PTRACE_CONT, pid, 0L, 0L); > + ASSERT_EQ(k, 0); > + k = waitpid(-1, &s, 0); > + ASSERT_EQ(k, pid); > + ASSERT_EQ(WIFSTOPPED(s), 1); > + ASSERT_EQ(WSTOPSIG(s), SIGTRAP); > + k = ptrace(PTRACE_CONT, pid, 0L, 0L); > ASSERT_EQ(k, 0); > k = waitpid(-1, &s, 0); > ASSERT_EQ(k, pid); > _ > > Patches currently in -mm which might be from bernd.edlinger@xxxxxxxxxx are > > exec-fix-dead-lock-in-de_thread-with-ptrace_attach.patch >