Re: [PATCH v9] exec: Fix dead-lock in de_thread with ptrace_attach

Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> · Tue, 22 Jun 2021 07:10:20 +0200

On 6/16/21 11:31 PM, Bernd Edlinger wrote:
> On 6/15/21 4:26 PM, Bernd Edlinger wrote:
>> The first phase of de_thread needs co-operation from a user task,
>> if and only if any task t except the thread leader has t->ptrace.
>> Taking tasks from RUNNING->EXIT_ZOMBIE only needs co-operation from kernel code,
> 
> 
> Aehm, sorry, that is not correct, what I said here.
> 
> I totally overlooked ptrace(PTRACE_SEIZE, pid, 0L, PTRACE_O_TRACEEXIT)
> 
> and unfortunately this also prevents even the thread leader to enter the
> EXIT_ZOMBIE state because do_exit does:
> 
>         ptrace_event(PTRACE_EVENT_EXIT, code);
> 
> unfortunately this sends an event to the tracer, and waits not only for
> the tracer to call waitpid, but also needs a PTRACE_CONT before do_exit
> can call exit_notify which does tsk->exit_state = EXIT_ZOMBIE.
> 

P.S:

I think there is something really odd in ptrace_stop().

If it is intentional (which I believe to be the case) to wait here after a
SIGKILL until the process enters the exit_state == EXIT_ZOMBIE, then aborting the
pending ptrace_stop() via sigkill_pending() is questionable, especially because
arch_ptrace_stop_needed() is defined as (0) in most architectures, only sparc and
ia64 do something here.

static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t *info)
        __releases(&current->sighand->siglock)
        __acquires(&current->sighand->siglock)
{
        bool gstop_done = false;

        if (arch_ptrace_stop_needed(exit_code, info)) {
                /*
                 * The arch code has something special to do before a
                 * ptrace stop.  This is allowed to block, e.g. for faults
                 * on user stack pages.  We can't keep the siglock while
                 * calling arch_ptrace_stop, so we must release it now.
                 * To preserve proper semantics, we must do this before
                 * any signal bookkeeping like checking group_stop_count.
                 * Meanwhile, a SIGKILL could come in before we retake the
                 * siglock.  That must prevent us from sleeping in TASK_TRACED.
                 * So after regaining the lock, we must check for SIGKILL.
                 */
                spin_unlock_irq(&current->sighand->siglock);
                arch_ptrace_stop(exit_code, info);
                spin_lock_irq(&current->sighand->siglock);
                if (sigkill_pending(current))
                        return;
        }

        set_special_state(TASK_TRACED);

After this point there is no sigkill_pending() or fatal_signal_pending(), just
a single freezable_schedule() which explains why this can even wait with a fatal
signal pending.  But if the code executes the if block above the sigkill can
only be ignored if it happens immediately before the set_special_state(TASK_TRACED).

What do you think?

Bernd.