Hey, Various tools need access to information about a process/task even after it has already been reaped. For example, systemd's journal logs and uses such information as the cgroup id and exit status to deal with processes that have been sent via SCM_PIDFD or SCM_PEERPIDFD. By the time the pidfd is received the process might have already been reaped. This series aims to provide information by extending the PIDFD_GET_INFO ioctl to retrieve the exit code and cgroup id. There might be other stuff that we would want in the future. Pidfd polling allows waiting on either task exit or for a task to have been reaped. The contract for PIDFD_INFO_EXIT is simply that EPOLLHUP must be observed before exit information can be retrieved, i.e., exit information is only provided once the task has been reaped. Note, that if a thread-group leader exits before other threads in the thread-group then exit information will only be available once the thread-group is empty. This aligns with wait() as well, where reaping of a thread-group leader that exited before the thread-group was empty is delayed until the thread-group is empty. With PIDFD_INFO_EXIT autoreaping might actually become usable because it means a parent can ignore SIGCHLD or set SA_NOCLDWAIT and simply use pidfd polling and PIDFD_INFO_EXIT to get get status information for its children. The kernel will autocleanup right away instead of delaying. This includes expansive selftests including for thread-group behior and multi-threaded exec by a non-thread-group leader thread. Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- Changes in v3: - Fix various minor issues. - Link to v2: https://lore.kernel.org/r/20250304-work-pidfs-kill_on_last_close-v2-0-44fdacfaa7b7@xxxxxxxxxx Changes in v2: - Call pidfs_exit() from release_task(). - Don't provide exit information once the task has exited but once the task has been reaped. This makes for simpler semantics. Thus, call pidfs_exit() from release_task(). - Link to v1: https://lore.kernel.org/r/20250228-work-pidfs-kill_on_last_close-v1-0-5bd7e6bb428e@xxxxxxxxxx --- Christian Brauner (16): pidfs: switch to copy_struct_to_user() pidfd: rely on automatic cleanup in __pidfd_prepare() pidfs: move setting flags into pidfs_alloc_file() pidfs: use private inode slab cache pidfs: record exit code and cgroupid at exit pidfs: allow to retrieve exit information selftests/pidfd: fix header inclusion pidfs/selftests: ensure correct headers for ioctl handling selftests/pidfd: expand common pidfd header selftests/pidfd: add first PIDFD_INFO_EXIT selftest selftests/pidfd: add second PIDFD_INFO_EXIT selftest selftests/pidfd: add third PIDFD_INFO_EXIT selftest selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest fs/internal.h | 1 + fs/libfs.c | 4 +- fs/pidfs.c | 182 +++++++- include/linux/pidfs.h | 1 + include/uapi/linux/pidfd.h | 3 +- kernel/exit.c | 2 + kernel/fork.c | 15 +- tools/testing/selftests/pidfd/.gitignore | 2 + tools/testing/selftests/pidfd/Makefile | 4 +- tools/testing/selftests/pidfd/pidfd.h | 93 ++++ tools/testing/selftests/pidfd/pidfd_exec_helper.c | 12 + tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 + tools/testing/selftests/pidfd/pidfd_info_test.c | 497 ++++++++++++++++++++++ tools/testing/selftests/pidfd/pidfd_open_test.c | 26 -- tools/testing/selftests/pidfd/pidfd_setns_test.c | 45 -- 15 files changed, 783 insertions(+), 105 deletions(-) --- base-commit: b1e809e7f64ad47dd232ff072d8ef59c1fe414c5 change-id: 20250227-work-pidfs-kill_on_last_close-a23ddf21db47