On Thu, Mar 06, 2025 at 01:17:13PM +0100, Oleg Nesterov wrote: > On 03/06, Christian Brauner wrote: > > > > Back when we implemented support for PIDFD_THREAD we ended up with the > > decision that if userspace holds: > > > > pidfd_leader_thread = pidfd_open(<thread-group-leader-pid>, PIDFD_THREAD) > > > > that exit notification is not strictly defined if a non-thread-group > > leader thread execs: > > Yes, this was even documented in commit 64bef697d33b ... Yeah, I'm aware I was just revisiting this decision. > > > --- a/kernel/exit.c > > +++ b/kernel/exit.c > > @@ -745,8 +745,11 @@ static void exit_notify(struct task_struct *tsk, int group_dead) > > /* > > * sub-thread or delay_group_leader(), wake up the > > * PIDFD_THREAD waiters. > > + * > > + * The thread-group leader will be taken over by the execing > > + * task so don't cause spurious wakeups. > > */ > > - if (!thread_group_empty(tsk)) > > + if (!thread_group_empty(tsk) && (tsk->signal->notify_count >= 0)) > > do_notify_pidfd(tsk); > > > > if (unlikely(tsk->ptrace)) { > > perhaps... but this won't help if the leader exits and that another > thread does exec? > > >From the changelog > > Perhaps we can improve this behaviour later, pidfd_poll() > can probably take sig->group_exec_task into account. But > this doesn't really differ from the case when the leader > exits before other threads (so pidfd_poll() succeeds) and > then another thread execs and pidfd_poll() will block again. > > so I am not sure what can we do. I think early thread-group leader exit is a bug in the userspace program whereas multi-threaded exec is well-defined (if ugly). To detect early-thread-group leader exec userspace could do: pidfd_leader_thread = pidfd_open(<thread-group-leader-pid>, 0) pidfd_thread = pidfd_open(<thread-group-leader-pid>, PIDFD_THREAD) and then they need to add both file descriptors to the poll instance. If proper multi-threaded exec happens no notification will be generated on either file descriptor. However, if the thread-group leader exits prematurely then a notification will be generated on pidfd_thread allowing detection of malformed behavior. So the premature thread-group leader exec thing is probably something we can ignore as a real use-case. > > I'll try to think more later, but I can't promise much :( > > Oleg. >