The comment above the idr_for_each_entry_continue() loop tries to explain why we have to signal each thread in the namespace, but it is outdated. This code no longer uses kill_proc_info(), we have a target task so we can check thread_group_leader() and avoid the unnecessary group_send_sig_info. Better yet, we can change pid_task() to use PIDTYPE_TGID rather than _PID, this way it returns NULL if this pid is not a group-leader pid. Also, change this code to check SIGNAL_GROUP_EXIT, the exiting process / thread doesn't necessarily has a pending SIGKILL. Either way these checks are racy without siglock, so the patch uses data_race() to shut up KCSAN. Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> --- kernel/pid_namespace.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 25f3cf679b35..0f9bd67c9e75 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -191,21 +191,14 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) * The last thread in the cgroup-init thread group is terminating. * Find remaining pid_ts in the namespace, signal and wait for them * to exit. - * - * Note: This signals each threads in the namespace - even those that - * belong to the same thread group, To avoid this, we would have - * to walk the entire tasklist looking a processes in this - * namespace, but that could be unnecessarily expensive if the - * pid namespace has just a few processes. Or we need to - * maintain a tasklist for each pid namespace. - * */ rcu_read_lock(); read_lock(&tasklist_lock); nr = 2; idr_for_each_entry_continue(&pid_ns->idr, pid, nr) { - task = pid_task(pid, PIDTYPE_PID); - if (task && !__fatal_signal_pending(task)) + task = pid_task(pid, PIDTYPE_TGID); + /* reading signal->flags is racy without sighand->siglock */ + if (task && !(data_race(task->signal->flags) & SIGNAL_GROUP_EXIT)) group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX); } read_unlock(&tasklist_lock); -- 2.25.1.362.g51ebf55