Hi Wei, thanks for more info. On 06/06, Wei Fu wrote: > > > Well, due to unfortunate design zap_pid_ns_processes() can hang "forever" > > if this namespace has a (zombie) task injected from the parent ns, this > > task should be reaped by its parent. > > That zombie task was cloned by pid-1 process in that pid namespace. In my last > reproduced log, the process tree in that pid namespace looks like OK, > ``` > # unshare(CLONE_NEWPID | CLONE_NEWNS) > > npm start (pid 2522045) > |__npm run zombie (pid 2522605) > |__ sh -c "whle true; do echo zombie; sleep 1; done" (pid 2522869) > ``` only 3 processes? nothing is running? Is the last process 2522869 a zombie too? Could you show your .config? In particular, CONFIG_PREEMPT... > The `npm start (pid 2522045)` was stuck in kernel_wait4. And its child, so this is the init task in this namespace, > `npm run zombie (pid 2522605)`, has two threads. One of them was in D status. ... > $ sudo cat /proc/2522605/task/*/stack > [<0>] synchronize_rcu_expedited+0x177/0x1f0 > [<0>] namespace_unlock+0xd6/0x1b0 > [<0>] put_mnt_ns+0x73/0xa0 > [<0>] free_nsproxy+0x1c/0x1b0 > [<0>] switch_task_namespaces+0x5d/0x70 > [<0>] exit_task_namespaces+0x10/0x20 > [<0>] do_exit+0x2ce/0x500 > [<0>] io_sq_thread+0x48e/0x5a0 > [<0>] ret_from_fork+0x3c/0x60 > [<0>] ret_from_fork_asm+0x1b/0x30 so I guess this is the trace of its sub-thread 2522645. What about the process 2522605? Has it exited too? > > But zap_pid_ns_processes() shouldn't cause the soft-lockup, it should > > sleep in kernel_wait4(). > > I run `cat /proc/2522045/status` and found that the status was kept switching > between running and sleeping. OK, this shouldn't happen in this case. So it really looks like it spins in a busy-wait loop because TIF_NOTIFY_SIGNAL is not cleared. It can be reported as sleeping because do_wait() sets/clears TASK_INTERRUPTIBLE, although the window is small... Oleg.