Oleg Nesterov <oleg@xxxxxxxxxx> writes: > kernel_wait4() doesn't sleep and returns -EINTR if there is no > eligible child and signal_pending() is true. > > That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not > enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending() > return false and avoid a busy-wait loop. I took a look through the code. It used to be that TIF_NOTIFY_SIGNAL was all about waking up a task so that task_work_run can be used. io_uring still mostly uses it that way. There is also a use in kthread_stop that just uses it as a TIF_SIGPENDING without having a pending signal. At the point in do_exit where exit_notify and thus zap_pid_ns_processes is called I can't possibly see a use for TIF_NOTIFY_SIGNAL. exit_task_work, exit_signals, and io_uring_cancel have all been called. So TIF_NOTIFY_SIGNAL should be spurious at this point and safe to clear. Why it remains set is a mystery to me. If I had infinite time and energy the ideal is to rework the pid namespace exit logic so that waiting for everything to exit works like delay_group_leader in wait_task_consider. Simply blocking reaping of the pid namespace leader until everything in the pid namespace have been reaped. I think acct_exit_ns is the only piece of code that needs to be moved to allow that, and acct_exit_ns is purely bookkeeping so does not affect userspace visible semantics. This active waiting is weird and non-standard in the kernel and winds up causeing a problem every couple of years because of that. > > Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") > Reported-by: Rachel Menge <rachelmenge@xxxxxxxxxxxxxxxxxxx> > Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@xxxxxxxxxxxxxxxxxxx/ > Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> > --- > kernel/pid_namespace.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index dc48fecfa1dc..25f3cf679b35 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) > */ > do { > clear_thread_flag(TIF_SIGPENDING); > + clear_thread_flag(TIF_NOTIFY_SIGNAL); > rc = kernel_wait4(-1, NULL, __WALL, NULL); > } while (rc != -ECHILD);