On Tue, Jan 23, 2024 at 08:56:08PM +0100, Oleg Nesterov wrote: > Too late for me, but I don't understand this patch after a quick glance. > perhaps I missed something... Thanks for taking a look. > On 01/23, Tycho Andersen wrote: > > > > @@ -256,6 +256,17 @@ void release_task(struct task_struct *p) > > write_lock_irq(&tasklist_lock); > > ptrace_release_task(p); > > thread_pid = get_pid(p->thread_pid); > > + > > + /* > > + * If we're not the leader, notify any waiters on our pidfds. Note that > > + * we don't want to notify the leader until /everyone/ in the thread > > + * group is dead, viz. the condition below. > > + * > > + * We have to do this here, since __exit_signal() will > > + * __unhash_processes(), and break do_notify_pidfd()'s lookup. > > + */ > > + if (!thread_group_leader(p)) > > + do_notify_pidfd(p); > > This doesn't look consistent. > > If the task is a group leader do_notify_pidfd() is called by exit_notify() > when it becomes a zombie (if no other threads), before it is reaped by its > parent (unless autoreap). There is another path, also in release_task(), that I was trying to mirror since it deals explicitly with sub-threads but, > If it is a sub-thread, it is called by release_task() above. Note that a > sub-thread can become a zombie too if it is traced. I didn't know about this. > > __exit_signal(p); > > and, do_notify_pidfd() is called before __exit_signal() which does > __unhash_process() -> detach_pid(PIDTYPE_PID). > > Doesn't this mean that pidfd_poll() can hang? thread_group_exited() > won't return true after do_notify_pidfd() above, not to mention that > thread_group_empty() is not possible if !thread_group_leader(). I was wondering about this too, but the test_non_tgl_poll_exit test in the next patch tests exactly this and works as expected. > So. When do we want to do do_notify_pidfd() ? Whe the task (leader or not) > becomes a zombie (passes exit_notify) or when it is reaped by release_task? It seems like we'd want it when exit_notify() is called in principle, since that's when the pid actually dies. When it is reaped is "mostly unrelated". Something like, 1. in the "normal" exit_notify() paths via do_notify_parent() 2. if none of those cases are true (aka the final else in exit_notify()) and the thread is not ptraced 3. via release_task() finally if this was the thread group leader and it died before some sub-thread then in pidfd_poll(), we can do: if (!tsk || (tsk->exit_state >= 0) || thread_group_exited()) do_notify_pidfd(); ? > Either way pidfd_poll() needs more changes with this patch and it can't > use thread_group_exited(). If do_notify_pidfd() is called by release_task() > after __exit_signal(), it can just check pid_has_task(PIDTYPE_PID). I suppose this is why my test works, since pid_task(PIDTYPE_PID) is null after release_task(). But if we want it to happen earlier, we'll have to do something like the above. Tycho