Oleg, Don't kill me but this is another attempt at trying to make pidfd polling for multi-threaded exec and premature thread-group leader exit consistent. A quick recap of these two cases: (1) During a multi-threaded exec by a subthread, i.e., non-thread-group leader thread, all other threads in the thread-group including the thread-group leader are killed and the struct pid of the thread-group leader will be taken over by the subthread that called exec. IOW, two tasks change their TIDs. (2) A premature thread-group leader exit means that the thread-group leader exited before all of the other subthreads in the thread-group have exited. Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the current thread-group leader may or may not see an exit notification on the file descriptor depending on when poll is performed. If the poll is performed before the exec of the subthread has concluded an exit notification is generated for the old thread-group leader. If the poll is performed after the exec of the subthread has concluded no exit notification is generated for the old thread-group leader. The correct behavior would be to simply not generate an exit notification on the struct pid of a subhthread exec because the struct pid is taken over by the subthread and thus remains alive. But this is difficult to handle because a thread-group may exit premature as mentioned in (2). In that case an exit notification is reliably generated but the subthreads may continue to run for an indeterminate amount of time and thus also may exec at some point. So far there was no way to distinguish between (1) and (2) internally. This tiny series tries to address this problem by remembering a premature leader exit in struct pid and forgetting it when a subthread execs and takes over the old thread-group leaders struct pid. If that works correctly then no exit notifications are generated for a PIDFD_THREAD pidfd for a thread-group leader until all subthreads have been reaped. If a subthread should exec before no exit notification will be generated until that task exits or it creates subthreads and repeates the cycle. Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> --- Christian Brauner (2): pidfs: improve multi-threaded exec and premature thread-group leader exit polling selftests/pidfd: test multi-threaded exec polling fs/pidfs.c | 28 +++++++++++++++++++++++-- include/linux/pid.h | 3 ++- kernel/exit.c | 24 +++++++++++++++++++-- kernel/pid.c | 9 ++++++++ tools/testing/selftests/pidfd/pidfd_info_test.c | 23 ++++++++++---------- 5 files changed, 71 insertions(+), 16 deletions(-) --- base-commit: c0ff2d6e30f20fd943eac5cdc3b0e89f2f2566ce change-id: 20250317-work-pidfs-thread_group-141682f9a50a