Re: [RCU] zombie task hung in synchronize_rcu_expedited

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

> Add Eric.
> 
> Well, due to unfortunate design zap_pid_ns_processes() can hang "forever"
> if this namespace has a (zombie) task injected from the parent ns, this
> task should be reaped by its parent.

That zombie task was cloned by pid-1 process in that pid namespace. In my last
reproduced log, the process tree in that pid namespace looks like

```
# unshare(CLONE_NEWPID | CLONE_NEWNS)

npm start (pid 2522045)
    |__npm run zombie (pid 2522605)
       |__ sh -c "whle true; do echo zombie; sleep 1; done" (pid 2522869)
```

The `npm start (pid 2522045)` was stuck in kernel_wait4. And its child,
`npm run zombie (pid 2522605)`, has two threads. One of them was in D status.
As far as I know, pid-2522605 can't be reaped by its parent pid-2522045 until
that thread returns from `synchronize_rcu_expedited`.

```
$ sudo cat /proc/2522605/task/*/stack
[<0>] synchronize_rcu_expedited+0x177/0x1f0
[<0>] namespace_unlock+0xd6/0x1b0
[<0>] put_mnt_ns+0x73/0xa0
[<0>] free_nsproxy+0x1c/0x1b0
[<0>] switch_task_namespaces+0x5d/0x70
[<0>] exit_task_namespaces+0x10/0x20
[<0>] do_exit+0x2ce/0x500
[<0>] io_sq_thread+0x48e/0x5a0
[<0>] ret_from_fork+0x3c/0x60
[<0>] ret_from_fork_asm+0x1b/0x30

$ sudo cat /proc/2522605/task/2522645/status
Name: iou-sqp-2522605
State: D (disk sleep)
Tgid: 2522605
Ngid: 0
Pid: 2522645
PPid: 2522045
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 0
Groups: 1000
NStgid: 2522605 25
NSpid: 2522645 40
NSpgid: 2522045 1
NSsid: 2522045 1
Kthread: 0
Threads: 2
SigQ: 0/128311
SigPnd: 0000000000000000
ShdPnd: 0000000000000100
SigBlk: fffffffffffbfeff
SigIgn: 0000000001001000
SigCgt: 0000000000014602
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 2
Seccomp_filters: 1
Speculation_Store_Bypass: vulnerable
SpeculationIndirectBranch: always enabled
Cpus_allowed: ff
Cpus_allowed_list: 0-7
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 11
nonvoluntary_ctxt_switches: 21
```

> 
> But zap_pid_ns_processes() shouldn't cause the soft-lockup, it should
> sleep in kernel_wait4().

I run `cat /proc/2522045/status` and found that the status was kept switching
between running and sleeping. But the kernel was still reporting soft lockup.
And there is log from dmesg. The CPU 5 wasn't able to report the quiescent
state. It seems that [rcu_flavor_sched_clock_irq][1] wasn't able to call
[run_qs][2].

```
rcu: INFO: rcu_sched self-detected stall on CPU
rcu:         5-....: (15000 ticks this GP) idle=db4c/1/0x4000000000000000 softirq=14924115/14924115 fqs=7430
rcu:                  hardirqs   softirqs   csw/system
rcu:          number:        0        833            0
rcu:         cputime:        0          0        29996   ==> 30000(ms)
rcu:         (t=15003 jiffies g=44379053 q=145851 ncpus=8)
CPU: 5 PID: 2522045 Comm: npm start Tainted: G             L     6.5.0-1021-azure #22~22.04.1-Ubuntu
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
RIP: 0010:_raw_spin_unlock_irqrestore+0x19/0x20
Code: cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 e8 62 06 00 00 90 f7 c6 00 02 00 00 74 01 fb 5d <e9> d2 19 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffa666c4bafc30 EFLAGS: 00000206
RAX: 0000000000000001 RBX: ffffa666c4bafcc0 RCX: 0000000000000020
RDX: ffff8a3d82130928 RSI: 0000000000000282 RDI: ffff8a3d82130920
RBP: ffffa666c4bafc48 R08: ffff8a3d82130928 R09: ffff8a3d82130928
R10: 0000000000000040 R11: 0000000000000002 R12: ffff8a3d82130920
R13: ffff8a44f3db9980 R14: ffff8a44f3db9980 R15: ffff8a44f3db9970
FS:  0000000000000000(0000) GS:ffff8a451fd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000767ea57cc000 CR3: 00000005db436002 CR4: 0000000000370ee0
Call Trace:
 <IRQ>
 ? show_regs+0x6a/0x80
 ? dump_cpu_task+0x71/0x90
 ? rcu_dump_cpu_stacks+0xe8/0x180
 ? print_cpu_stall+0x131/0x290
 ? load_balance+0x160/0x870
 ? check_cpu_stall+0x1d8/0x270
 ? rcu_pending+0x32/0x1e0
 ? rcu_sched_clock_irq+0x16e/0x290
 ? update_process_times+0x63/0xa0
 ? tick_sched_handle+0x28/0x70
 ? tick_sched_timer+0x77/0x90
 ? __pfx_tick_sched_timer+0x10/0x10
 ? __hrtimer_run_queues+0x111/0x240
 ? srso_alias_return_thunk+0x5/0x7f
 ? hrtimer_interrupt+0x101/0x240
 ? hv_stimer0_isr+0x20/0x30
 ? __sysvec_hyperv_stimer0+0x32/0x70
 ? sysvec_hyperv_stimer0+0x7b/0x90
 </IRQ>
 <TASK>
 ? asm_sysvec_hyperv_stimer0+0x1b/0x20
 ? _raw_spin_unlock_irqrestore+0x19/0x20
 ? remove_wait_queue+0x47/0x50
 do_wait+0x19f/0x300
 kernel_wait4+0xaf/0x150
 ? __pfx_child_wait_callback+0x10/0x10
 zap_pid_ns_processes+0x105/0x190
 forget_original_parent+0x2e4/0x360
 exit_notify+0x4a/0x210
 do_exit+0x30b/0x500
 ? srso_alias_return_thunk+0x5/0x7f
 ? wake_up_state+0x10/0x20
 ? srso_alias_return_thunk+0x5/0x7f
 do_group_exit+0x35/0x90
 __x64_sys_exit_group+0x18/0x20
 x64_sys_call+0xd95/0x1ff0
 do_syscall_64+0x56/0x80
 ? srso_alias_return_thunk+0x5/0x7f
 ? handle_mm_fault+0x128/0x290
 ? srso_alias_return_thunk+0x5/0x7f
 ? srso_alias_return_thunk+0x5/0x7f
 ? exit_to_user_mode_prepare+0x49/0x100
 ? srso_alias_return_thunk+0x5/0x7f
 ? irqentry_exit_to_user_mode+0x19/0x30
 ? srso_alias_return_thunk+0x5/0x7f
 ? irqentry_exit+0x1d/0x30
 ? srso_alias_return_thunk+0x5/0x7f
 ? exc_page_fault+0x80/0x160
 entry_SYSCALL_64_after_hwframe+0x73/0xdd
RIP: 0033:0x75fce9367f8e
Code: Unable to access opcode bytes at 0x75fce9367f64.
RSP: 002b:00007ffc80c04b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 000075fce672d1b0 RCX: 000075fce9367f8e
RDX: 000075fce93e30c0 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 00007ffc80c04b60 R08: 0000000000000024 R09: 0000000800000000
R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000001
R13: 000075fce9081a90 R14: 0000000000000000 R15: 000075fce6771d50
 </TASK>

rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 5-.... } 15359 jiffies s: 90777 root: 0x20/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 4 to CPUs 5:
NMI backtrace for cpu 5
CPU: 5 PID: 2522045 Comm: npm start Tainted: G             L     6.5.0-1021-azure #22~22.04.1-Ubuntu
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
RIP: 0010:do_wait+0x11/0x300
Code: 8b 4d d4 e9 28 fd ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 <53> 48 89 fb 48 83 ec 08 48 8b 77 08 0f 1f 44 00 00 65 4c 8b 34 25
RSP: 0018:ffffa666c4bafc68 EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000040000004 RCX: 0000000000000000
RDX: 0000000040000000 RSI: 0000000000000000 RDI: ffffa666c4bafc98
RBP: ffffa666c4bafc88 R08: ffff8a3d82130928 R09: ffff8a3d82130928
R10: 0000000000000040 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8a451fd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000767ea57cc000 CR3: 00000005db436002 CR4: 0000000000370ee0
Call Trace:
 <NMI>
 ? show_regs+0x6a/0x80
 ? nmi_cpu_backtrace+0x9c/0x100
 ? nmi_cpu_backtrace_handler+0x11/0x20
 ? nmi_handle+0x62/0x160
 ? default_do_nmi+0x45/0x120
 ? exc_nmi+0x19f/0x250
 ? end_repeat_nmi+0x16/0x67
 ? do_wait+0x11/0x300
 ? do_wait+0x11/0x300
 ? do_wait+0x11/0x300
 </NMI>
 <TASK>
 kernel_wait4+0xaf/0x150
 ? __pfx_child_wait_callback+0x10/0x10
 zap_pid_ns_processes+0x105/0x190
 forget_original_parent+0x2e4/0x360
 exit_notify+0x4a/0x210
 do_exit+0x30b/0x500
 ? srso_alias_return_thunk+0x5/0x7f
 ? wake_up_state+0x10/0x20
 ? srso_alias_return_thunk+0x5/0x7f
 do_group_exit+0x35/0x90
 __x64_sys_exit_group+0x18/0x20
 x64_sys_call+0xd95/0x1ff0
 do_syscall_64+0x56/0x80
 ? srso_alias_return_thunk+0x5/0x7f
 ? handle_mm_fault+0x128/0x290
 ? srso_alias_return_thunk+0x5/0x7f
 ? srso_alias_return_thunk+0x5/0x7f
 ? exit_to_user_mode_prepare+0x49/0x100
 ? srso_alias_return_thunk+0x5/0x7f
 ? irqentry_exit_to_user_mode+0x19/0x30
 ? srso_alias_return_thunk+0x5/0x7f
 ? irqentry_exit+0x1d/0x30
 ? srso_alias_return_thunk+0x5/0x7f
 ? exc_page_fault+0x80/0x160
 entry_SYSCALL_64_after_hwframe+0x73/0xdd
RIP: 0033:0x75fce9367f8e
Code: Unable to access opcode bytes at 0x75fce9367f64.
RSP: 002b:00007ffc80c04b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 000075fce672d1b0 RCX: 000075fce9367f8e
RDX: 000075fce93e30c0 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 00007ffc80c04b60 R08: 0000000000000024 R09: 0000000800000000
R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000001
R13: 000075fce9081a90 R14: 0000000000000000 R15: 000075fce6771d50
 </TASK>


watchdog: BUG: soft lockup - CPU#5 stuck for 85s! [npm start:2522045]
Modules linked in: tls raw_diag unix_diag af_packet_diag netlink_diag udp_diag tcp_diag inet_diag xt_statistic xt_mark veth xt_comment xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_nat xt_MASQUERADE nft_chain_nat nf_nat nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay binfmt_misc nls_iso8859_1 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_owner xt_tcpudp nft_compat crct10dif_pclmul crc32_pclmul nf_tables polyval_clmulni polyval_generic ghash_clmulni_intel libcrc32c sha256_ssse3 joydev sha1_ssse3 hid_generic nfnetlink aesni_intel crypto_simd hyperv_drm cryptd hid_hyperv serio_raw drm_kms_helper hv_netvsc hid hyperv_keyboard pata_acpi drm_shmem_helper dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel msr drm efi_pstore i2c_core ip_tables x_tables autofs4
CPU: 5 PID: 2522045 Comm: npm start Tainted: G             L     6.5.0-1021-azure #22~22.04.1-Ubuntu
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
RIP: 0010:_raw_spin_unlock_irqrestore+0x19/0x20
Code: cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 e8 62 06 00 00 90 f7 c6 00 02 00 00 74 01 fb 5d <e9> d2 19 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffa666c4bafc30 EFLAGS: 00000206
RAX: 0000000000000001 RBX: ffffa666c4bafcc0 RCX: 0000000000000020
RDX: ffff8a3d82130928 RSI: 0000000000000282 RDI: ffff8a3d82130920
RBP: ffffa666c4bafc48 R08: ffff8a3d82130928 R09: ffff8a3d82130928
R10: 0000000000000040 R11: 0000000000000002 R12: ffff8a3d82130920
R13: ffff8a44f3db9980 R14: ffff8a44f3db9980 R15: ffff8a44f3db9970
FS:  0000000000000000(0000) GS:ffff8a451fd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000767ea57cc000 CR3: 00000005db436002 CR4: 0000000000370ee0
Call Trace:
 <IRQ>
 ? show_regs+0x6a/0x80
 ? watchdog_timer_fn+0x1ce/0x230
 ? __pfx_watchdog_timer_fn+0x10/0x10
 ? __hrtimer_run_queues+0x111/0x240
 ? srso_alias_return_thunk+0x5/0x7f
 ? hrtimer_interrupt+0x101/0x240
 ? hv_stimer0_isr+0x20/0x30
 ? __sysvec_hyperv_stimer0+0x32/0x70
 ? sysvec_hyperv_stimer0+0x7b/0x90
 </IRQ>
 <TASK>
 ? asm_sysvec_hyperv_stimer0+0x1b/0x20
 ? _raw_spin_unlock_irqrestore+0x19/0x20
 ? remove_wait_queue+0x47/0x50
 do_wait+0x19f/0x300
 kernel_wait4+0xaf/0x150
 ? __pfx_child_wait_callback+0x10/0x10
 zap_pid_ns_processes+0x105/0x190
 forget_original_parent+0x2e4/0x360
 exit_notify+0x4a/0x210
 do_exit+0x30b/0x500
 ? srso_alias_return_thunk+0x5/0x7f
 ? wake_up_state+0x10/0x20
 ? srso_alias_return_thunk+0x5/0x7f
 do_group_exit+0x35/0x90
 __x64_sys_exit_group+0x18/0x20
 x64_sys_call+0xd95/0x1ff0
 do_syscall_64+0x56/0x80
 ? srso_alias_return_thunk+0x5/0x7f
 ? handle_mm_fault+0x128/0x290
 ? srso_alias_return_thunk+0x5/0x7f
 ? srso_alias_return_thunk+0x5/0x7f
 ? exit_to_user_mode_prepare+0x49/0x100
 ? srso_alias_return_thunk+0x5/0x7f
 ? irqentry_exit_to_user_mode+0x19/0x30
 ? srso_alias_return_thunk+0x5/0x7f
 ? irqentry_exit+0x1d/0x30
 ? srso_alias_return_thunk+0x5/0x7f
 ? exc_page_fault+0x80/0x160
 entry_SYSCALL_64_after_hwframe+0x73/0xdd
RIP: 0033:0x75fce9367f8e
Code: Unable to access opcode bytes at 0x75fce9367f64.
RSP: 002b:00007ffc80c04b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 000075fce672d1b0 RCX: 000075fce9367f8e
RDX: 000075fce93e30c0 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 00007ffc80c04b60 R08: 0000000000000024 R09: 0000000800000000
R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000001
R13: 000075fce9081a90 R14: 0000000000000000 R15: 000075fce6771d50
 </TASK>
```

> 
> Any chance you can test the patch below? This patch makes sense anyway,
> I'll send it later. But I am not sure it can fix your problem.

Sure! Will do! Thanks

> 
> Oleg.
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index dc48fecfa1dc..25f3cf679b35 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  	 */
>  	do {
>  		clear_thread_flag(TIF_SIGPENDING);
> +		clear_thread_flag(TIF_NOTIFY_SIGNAL);
>  		rc = kernel_wait4(-1, NULL, __WALL, NULL);
>  	} while (rc != -ECHILD);
>  
> 
> 

Wei, Fu

[1]: https://elixir.bootlin.com/linux/v6.5/source/kernel/rcu/tree_plugin.h#L964
[2]: https://elixir.bootlin.com/linux/v6.5/source/kernel/rcu/tree_plugin.h#L848




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux