On Thu, Jan 25, 2024 at 06:51:14PM +0100, Oleg Nesterov wrote: > On 01/25, Christian Brauner wrote: > > > > > > When it is reaped is "mostly unrelated". > > > > > > Then why pidfd_poll() can't simply check !task || task->exit_state ? > > > > > > Nevermind. So, currently pidfd_poll() succeeds when the leader can be > > > > Hm, the comment right above mentions: > > > > /* > > * Inform pollers only when the whole thread group exits. > > * If the thread group leader exits before all other threads in the > > * group, then poll(2) should block, similar to the wait(2) family. > > */ > > > reaped, iow the whole thread group has exited. > > Yes, but the comment doesn't contradict with what I have said? No, it doesn't. I'm trying to understand what you are suggesting though. Are you saying !task || tas->exit_state is enough and we shouldn't use the helper that was added in commit 38fd525a4c61 ("exit: Factor thread_group_exited out of pidfd_poll"). If so what does that buy us open-coding the check instead of using that helper? Is there an actual bug here? > > > But even if you are the > > > parent, you can't expect that wait(WNOHANG) must succeed, the leader > > > can be traced. I guess it is too late to change this behaviour. > > > > Hm, why is that an issue though? > > Well, I didn't say this is a problem. I simply do not know how/why people > use pidfd_poll(). Sorry, I just have a hard time understanding what you wanted then. :) "I guess it is too late to change this behavior." made it sound like a) there's a problem and b) that you would prefer to change behavior. Thus, it seems that wait(WNOHANG) hanging when a traced leader of an empty thread-group has exited is a problem in your eyes. > > I mostly tried to explain why do I think that do_notify_pidfd() should > be always called from exit_notify() path, not by release_task(), even > if the task is not a leader. > > > Because a program would rely on WNOHANG to hang on > > a ptraced leader? That seems esoteric imho. > > To me it would be usefule, but lets not discuss this now. The "patch" Ok, that's good then. I would expect that at least stuff like rr makes use of pidfd and they might rely on this behavior - although I haven't checked their code. > I sent doesn't change the current behaviour. Yeah, I got that but it would still be useful to understand the wider context you were adressing. > > > > What if we add the new PIDFD_THREAD flag? With this flag > > > > > > - sys_pidfd_open() doesn't require the must be a group leader > > > > Yes. > > > > > > > > - pidfd_poll() succeeds when the task passes exit_notify() and > > > becomes a zombie, even if it is a leader and has other threads. > > > > Iiuc, if an existing user creates a pidfd for a thread-group leader and > > then polls that pidfd they would currently only get notified if the > > thread-group is empty and the leader has exited. > > > > If we now start notifying when the thread-group leader exits but the > > thread-group isn't empty then this would be a fairly big api change > > Hmm... again, this patch doesn't (shouldn't) change the current behavior. > > Please note "with this flag" above. If sys_pidfd_open() was called > without PIDFD_THREAD, then sys_pidfd_open() still requires that the > target task must be a group leader, and pidfd_poll() won't succeed > until the leader exits and thread_group_empty() is true. Yeah, I missed the PIDFD_THREAD flag suggestion. Sorry about that. Btw, I'm not sure whether you remember that but when we originally did the pidfd work you and I discussed thread support and already decided back then that having a flag like PIDFD_THREAD would likely be the way to go. The PIDFD_THREAD flag would be would be interesting because we could make pidfd_send_signal() support this flag as well to allow sending a signal to a specific thread. That's something that I had also wanted to support. And I've been asked for this a few times already. What do you think?