Softlock up of rcu_sched has been detect

MengEn Sun <mengensun88@xxxxxxxxx> · Sun, 8 Dec 2024 00:48:29 +0800

Hi,

The kernel version that we are using is 5.4.241. Recently, we
encountered a case of rcu_sched thread soft lock. The kernel stack
is as follows:

crash> bt
PID: 10     TASK: ffff8a83cc610000  CPU: 14  COMMAND: "rcu_sched"
 #0 [ffffc90000a64d50] machine_kexec at ffffffff8105714c 
 #1 [ffffc90000a64da0] __crash_kexec at ffffffff8112c9bd 
 #2 [ffffc90000a64e68] panic at ffffffff819e29d8 
 #3 [ffffc90000a64ee8] watchdog_timer_fn.cold.9 at ffffffff819e9a3f 
 #4 [ffffc90000a64f18] __hrtimer_run_queues at ffffffff8110b5d0 
 #5 [ffffc90000a64f78] hrtimer_interrupt at ffffffff8110bdf0 
 #6 [ffffc90000a64fd8] smp_apic_timer_interrupt at ffffffff81c0268a 
 #7 [ffffc90000a64ff0] apic_timer_interrupt at ffffffff81c01c0f 
--- <IRQ stack> ---
 #8 [ffffc9000006bdd8] apic_timer_interrupt at ffffffff81c01c0f 
    [exception RIP: _raw_spin_unlock_irqrestore+14]
 #9 [ffffc9000006be80] force_qs_rnp at ffffffff810fa0b0 
#10 [ffffc9000006beb8] rcu_gp_kthread at ffffffff810fd9b6 
#11 [ffffc9000006bf10] kthread at ffffffff8109e40c 
#12 [ffffc9000006bf50] ret_from_fork at ffffffff81c00255 

It seems that it is because a CPU did not report its QS in time.
I see from vmcore that the 141 cpu did not report its QS

The current process on the 141 CPU is as follows, it looks like 141
has detected the rcu stall and the print serial port log

crash> bt -c 141
PID: 594039  TASK: ffff8a699dab8000  CPU: 141  COMMAND: "stress-ng-zombi"
 #0 [fffffe0001f70e40] crash_nmi_callback at ffffffff8104b54f 
 #1 [fffffe0001f70e60] nmi_handle at ffffffff8101cf92 
 #2 [fffffe0001f70eb8] default_do_nmi at ffffffff8101d16e 
 #3 [fffffe0001f70ed8] do_nmi at ffffffff8101d351 
 #4 [fffffe0001f70ef0] end_repeat_nmi at ffffffff81c015f3 
    [exception RIP: vprintk_emit+492]
--- <NMI exception stack> ---
 #5 [ffffc90002038db0] vprintk_emit at ffffffff810eb19c 
 #6 [ffffc90002038e00] printk at ffffffff819e6544 
 #7 [ffffc90002038e60] rcu_check_gp_kthread_starvation at ffffffff819e78bd 
 #8 [ffffc90002038e88] rcu_sched_clock_irq.cold.84 at ffffffff819e8021 
 #9 [ffffc90002038ed0] update_process_times at ffffffff8110a8e4 
#10 [ffffc90002038ee0] tick_sched_handle at ffffffff8111be22 
#11 [ffffc90002038ef8] tick_sched_timer at ffffffff8111c147 
#12 [ffffc90002038f18] __hrtimer_run_queues at ffffffff8110b5d0 
#13 [ffffc90002038f78] hrtimer_interrupt at ffffffff8110bdf0 
#14 [ffffc90002038fd8] smp_apic_timer_interrupt at ffffffff81c0268a 
#15 [ffffc90002038ff0] apic_timer_interrupt at ffffffff81c01c0f 
--- <IRQ stack> ---
#16 [ffffc90004227c98] apic_timer_interrupt at ffffffff81c01c0f 
    [exception RIP: __rb_erase_color+34]
#17 [ffffc90004227d78] unlink_file_vma at ffffffff81241e0b 
#18 [ffffc90004227da0] free_pgtables at ffffffff8123708e 
#19 [ffffc90004227dd8] exit_mmap at ffffffff81243ea1 
#20 [ffffc90004227e78] mmput at ffffffff81074fd4 
#21 [ffffc90004227e90] do_exit at ffffffff8107d50c 
#22 [ffffc90004227f08] do_group_exit at ffffffff8107de6a 
#23 [ffffc90004227f30] __x64_sys_exit_group at ffffffff8107dee4 
#24 [ffffc90004227f38] do_syscall_64 at ffffffff81002535 
#25 [ffffc90004227f50] entry_SYSCALL_64_after_hwframe at ffffffff81c000a4 

I think the reasons are as follows:
- The rcu_sched needs to lock the rcu node where the 141 cpu is located
  to force the QS operations.
- However 141 detects rcu stall and locks rcu node to print some serial
  logs.
- The 141 cpu does touch nmi considering that printing port logs may
  cause system soft or hard locks.
- However, The 141 cpu does not take into account other cpu, such as
  cpu 14, which may be blocked because it holds the lock of rcu node.

I think printing serial port log is a more expensive operation, can
we only let the 141 cpu in the critical section of the rcu node lock,
collect the information it needs to print, and print the log it
collected after it released the rcu node lock?

I'm not sure if there is a better way to solve this problem?

Regards,
Meng En