On 6/16/21 11:31 PM, Bernd Edlinger wrote: > On 6/15/21 4:26 PM, Bernd Edlinger wrote: >> The first phase of de_thread needs co-operation from a user task, >> if and only if any task t except the thread leader has t->ptrace. >> Taking tasks from RUNNING->EXIT_ZOMBIE only needs co-operation from kernel code, > > > Aehm, sorry, that is not correct, what I said here. > > I totally overlooked ptrace(PTRACE_SEIZE, pid, 0L, PTRACE_O_TRACEEXIT) > > and unfortunately this also prevents even the thread leader to enter the > EXIT_ZOMBIE state because do_exit does: > > ptrace_event(PTRACE_EVENT_EXIT, code); > > unfortunately this sends an event to the tracer, and waits not only for > the tracer to call waitpid, but also needs a PTRACE_CONT before do_exit > can call exit_notify which does tsk->exit_state = EXIT_ZOMBIE. > P.S: I think there is something really odd in ptrace_stop(). If it is intentional (which I believe to be the case) to wait here after a SIGKILL until the process enters the exit_state == EXIT_ZOMBIE, then aborting the pending ptrace_stop() via sigkill_pending() is questionable, especially because arch_ptrace_stop_needed() is defined as (0) in most architectures, only sparc and ia64 do something here. static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t *info) __releases(¤t->sighand->siglock) __acquires(¤t->sighand->siglock) { bool gstop_done = false; if (arch_ptrace_stop_needed(exit_code, info)) { /* * The arch code has something special to do before a * ptrace stop. This is allowed to block, e.g. for faults * on user stack pages. We can't keep the siglock while * calling arch_ptrace_stop, so we must release it now. * To preserve proper semantics, we must do this before * any signal bookkeeping like checking group_stop_count. * Meanwhile, a SIGKILL could come in before we retake the * siglock. That must prevent us from sleeping in TASK_TRACED. * So after regaining the lock, we must check for SIGKILL. */ spin_unlock_irq(¤t->sighand->siglock); arch_ptrace_stop(exit_code, info); spin_lock_irq(¤t->sighand->siglock); if (sigkill_pending(current)) return; } set_special_state(TASK_TRACED); After this point there is no sigkill_pending() or fatal_signal_pending(), just a single freezable_schedule() which explains why this can even wait with a fatal signal pending. But if the code executes the if block above the sigkill can only be ignored if it happens immediately before the set_special_state(TASK_TRACED). What do you think? Bernd.