On 03/02, Christian Brauner wrote: > > On Sun, Mar 02, 2025 at 04:53:46PM +0100, Oleg Nesterov wrote: > > On 02/28, Christian Brauner wrote: > > > > > > Some tools like systemd's jounral need to retrieve the exit and cgroup > > > information after a process has already been reaped. > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > But unless I am totally confused do_exit() calls pidfd_exit() even > > before exit_notify(), the exiting task is not even zombie yet. It > > will reaped only when it passes exit_notify() and its parent does > > wait(). > > The overall goal is that it's possible to retrieve exit status and > cgroupid even if the task has already been reaped. OK, please see below... > It's intentionally placed before exit_notify(), i.e., before the task is > a zombie because exit_notify() wakes pidfd-pollers. Ideally, pidfd > pollers would be woken and then could use the PIDFD_GET_INFO ioctl to > retrieve the exit status. This was more a less clear to me. But this doesn't match the "the task has already been reaped" goal above... > It would however be fine to place it into exit_notify() if it's a better > fit there. If you have a preference let me know. > > I don't see a reason why seeing the exit status before that would be an > issue. The problem is that it is not clear how can we do this correctly. Especialy considering the problem with exec... > > But what if this file was created without PIDFD_THREAD? If another > > thread does exit_group(1) after that, the process's exit code is > > 1 << 8, but it can't be retrieved. > > Yes, I had raised that in an off-list discussion about this as well and > was unsure what the cleanest way of dealing with this would be. I am not sure too, but again, please see below. > > Now, T is very much alive, but pidfs_i(inode)->exit_info != NULL. ... > What's the best way of handling the de_thread() case? Would moving this > into exit_notify() be enough where we also handle > PIDFD_THREAD/~PIDFD_THREAD waking? I don't think that moving pidfd_exit() into exit_notify() can solve any problem. But what if we move pidfd_exit() into release_task() paths? Called when the task is reaped by the parent/debugger, or if a sub-thread auto-reaps. Can the users of pidfd_info(PIDFD_INFO_EXIT) rely on POLLHUP from release_task() -> detach_pid() -> __change_pid(new => NULL) ? Oleg.