On 11/15, Yonghong Song wrote: > > On 11/14/23 11:32 AM, Oleg Nesterov wrote: > >@@ -70,15 +70,13 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm > > return NULL; > > retry: > >- task = next_thread(task); > >+ task = __next_thread(task); > >+ if (!task) > >+ return NULL; > > next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns); > >- if (!next_tid || next_tid == common->pid) { > >- /* Run out of tasks of a process. The tasks of a > >- * thread_group are linked as circular linked list. > >- */ > >- return NULL; > >- } > >+ if (!next_tid) > >+ goto retry; > > Look at the code. Looks like next_tid should never be 0 ... > pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type, > struct pid_namespace *ns) > { > pid_t nr = 0; > > rcu_read_lock(); > if (!ns) > ns = task_active_pid_ns(current); > nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns); ^^^^^^^^^^^^^^^^^^^^^^^^^ Please note that task_pid_ptr(task, type)) can return NULL if this task has already exited and called detach_pid(). detach_pid() does __change_pid(task, type, NULL), please note the *pid_ptr = new; // NULL in this case assignment in __change_pid(). IOW. The problem is not that ns can change, the problem is that task->thread_pid (and other pid links) can be NULL, and in this case pid_nr_ns() returns zero. This code should be rewritten from the very beginning, it should not rely on pid_nr. If nothing else common->pid and/or pid_visiting can be reused. But currently my only concern is next_thread(). > Other than above, the change looks good to me. Thanks for review! Oleg.