On 6/12/21 1:16 AM, Andrew Morton wrote: > On Fri, 11 Jun 2021 17:55:09 +0200 Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> wrote: > >> This introduces signal->unsafe_execve_in_progress, >> which is used to fix the case when at least one of the >> sibling threads is traced, and therefore the trace >> process may dead-lock in ptrace_attach, but de_thread >> will need to wait for the tracer to continue execution. >> >> The solution is to detect this situation and allow >> ptrace_attach to continue, while de_thread() is still >> waiting for traced zombies to be eventually released. >> When the current thread changed the ptrace status from >> non-traced to traced, we can simply abort the whole >> execve and restart it by returning -ERESTARTSYS. >> This needs to be done before changing the thread leader, >> because the PTRACE_EVENT_EXEC needs to know the old >> thread pid. >> >> Although it is technically after the point of no return, >> we just have to reset bprm->point_of_no_return here, >> since at this time only the other threads have received >> a fatal signal, not the current thread. >> >> >From the user's point of view the whole execve was >> simply delayed until after the ptrace_attach. >> >> Other threads die quickly since the cred_guard_mutex >> is released, but a deadly signal is already pending. >> In case the mutex_lock_killable misses the signal, >> ->unsafe_execve_in_progress makes sure they release >> the mutex immediately and return with -ERESTARTNOINTR. >> >> This means there is no API change, unlike the previous >> version of this patch which was discussed here: >> >> https://lore.kernel.org/lkml/b6537ae6-31b1-5c50-f32b-8b8332ace882@xxxxxxxxxx/ >> >> See tools/testing/selftests/ptrace/vmaccess.c >> for a test case that gets fixed by this change. >> >> Note that since the test case was originally designed to >> test the ptrace_attach returning an error in this situation, >> the test expectation needed to be adjusted, to allow the >> API to succeed at the first attempt. >> > > err, sorry. I replied to the v8 patch, not to v9. > Sorry for the confusion. Originally the loop here looked was entered with sighand locked and was like this: while (sig->notify_count) { __set_current_state(TASK_KILLABLE); if (!sig->notify_count) break; spin_unlock_irq(lock); schedule(); if (__fatal_signal_pending(tsk)) goto killed; } spin_unlock_irq(lock); v8 did this (tried avoid lots of spin-lock/unlocks): sig->group_exit_task = tsk; sig->notify_count = zap_other_threads(tsk); if (!thread_group_leader(tsk)) sig->notify_count--; spin_unlock_irq(lock); if (unlikely(sig->unsafe_execve_in_progress)) mutex_unlock(&sig->cred_guard_mutex); for (;;) { set_current_state(TASK_KILLABLE); if (!sig->notify_count) break; schedule(); if (__fatal_signal_pending(tsk)) goto killed; } but here I overlooked that there is an execution path without any spin-lock where sig->group_exit_task is set to NULL, which could create a race with __signal_exit. So v9 keeps the loop as it was, and instead does this: if (unlikely(sig->unsafe_execve_in_progress)) { spin_unlock_irq(lock); mutex_unlock(&sig->cred_guard_mutex); spin_lock_irq(lock); } because I would not like to release the mutex while an interrupt spin-lock is held. Bernd. > --- a/fs/exec.c~exec-fix-dead-lock-in-de_thread-with-ptrace_attach-v9 > +++ a/fs/exec.c > @@ -1056,29 +1056,31 @@ static int de_thread(struct task_struct > return -EAGAIN; > } > > - while_each_thread(tsk, t) { > - if (unlikely(t->ptrace) && t != tsk->group_leader) > - sig->unsafe_execve_in_progress = true; > - } > - > sig->group_exit_task = tsk; > sig->notify_count = zap_other_threads(tsk); > if (!thread_group_leader(tsk)) > sig->notify_count--; > - spin_unlock_irq(lock); > > - if (unlikely(sig->unsafe_execve_in_progress)) > + while_each_thread(tsk, t) { > + if (unlikely(t->ptrace) && t != tsk->group_leader) > + sig->unsafe_execve_in_progress = true; > + } > + > + if (unlikely(sig->unsafe_execve_in_progress)) { > + spin_unlock_irq(lock); > mutex_unlock(&sig->cred_guard_mutex); > + spin_lock_irq(lock); > + } > > - for (;;) { > - set_current_state(TASK_KILLABLE); > - if (!sig->notify_count) > - break; > + while (sig->notify_count) { > + __set_current_state(TASK_KILLABLE); > + spin_unlock_irq(lock); > schedule(); > if (__fatal_signal_pending(tsk)) > goto killed; > + spin_lock_irq(lock); > } > - __set_current_state(TASK_RUNNING); > + spin_unlock_irq(lock); > > if (unlikely(sig->unsafe_execve_in_progress)) { > if (mutex_lock_killable(&sig->cred_guard_mutex)) > _ >