On Thu, Feb 13, 2025 at 8:39 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > On Thu, Feb 13, 2025 at 8:32 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <pmladek@xxxxxxxx> wrote: > > > > > > On Tue 2025-02-11 14:24:36, Yafang Shao wrote: > > > > I encountered a hard lockup when attempting to reproduce the panic issue > > > > occurred on our production servers [0]. The hard lockup is as follows, > > > > > > > > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit > > > > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit > > > > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit > > > > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit > > > > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit > > > > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit > > > > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit > > > > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit > > > > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit > > > > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit > > > > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit > > > > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit > > > > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit > > > > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit > > > > > > This message does not exist in vanilla kernel. It looks like an extra > > > debug message. And many extra messages might create stalls on its own. > > > > Right, the dynamic_debug is enabled: > > > > $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control > > > > > > > > AFAIK, your reproduced the problem even without these extra messages. > > > > There are two issues during the KLP transition: > > 1. RCU warnings > > 2. hard lockup > > > > RCU stalls can be easily reproduced without the extra messages. > > However, hard lockups cannot be reproduced unless dynamic debugging is > > enabled. > > > > > At least, I see the following at > > > https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@xxxxxxxxxxxxxx > > > > That's correct, this is related to the RCU warnings issue. > > > > > > > > <paste> > > > [20329703.332453] livepatch: enabling patch 'livepatch_61_release6' > > > [20329703.340417] livepatch: 'livepatch_61_release6': starting > > > patching transition > > > [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is > > > 10166 jiffies old. > > > [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is > > > 10219 jiffies old. > > > [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is > > > 10199 jiffies old. > > > [20329754.848036] livepatch: 'livepatch_61_release6': patching complete > > > </paste> > > > > > > Could you please confirm that this the original _non-filtered_ log? > > > > Right. > > > > > I mean that the debug messages were _not_ printed and later filtered? > > > > Right. > > > > > > > > I would like to know more about the system where RCU reported the > > > stall. How many processes are running there in average? > > > A rough number is enough. I wonder if it is about 1000, 10000, or > > > 50000? > > > > Most of the servers have between 5,000 and 10,000 threads. > > > > > > > > Also what is the CONFIG_HZ value, please? > > > > CONFIG_HZ_PERIODIC=y > > CONFIG_HZ_1000=y > > CONFIG_HZ=1000 > > > > > > > > Also we should get some statistics how long klp_try_switch_task() > > > lasts in average. I have never did it but I guess that > > > it should be rather easy with perf. Or maybe just by looking > > > at function_graph trace. > > > > Currently, on one of my production servers, CPU usage is around 40%, > > and the number of threads is approximately 9,100. The livepatch is > > applied every 5 seconds. We can adjust the frequency, but the > > difference won't be significant, as RCU stalls are easy to reproduce > > even at very low frequencies. > > The duration of klp_try_switch_task() is as follows: > > > > $ /usr/share/bcc/tools/funclatency klp_try_switch_task.part.0 -d 60 > > Tracing 1 functions for "klp_try_switch_task.part.0"... Hit Ctrl-C to end. > > nsecs : count distribution > > 0 -> 1 : 0 | | > > 2 -> 3 : 0 | | > > 4 -> 7 : 0 | | > > 8 -> 15 : 0 | | > > 16 -> 31 : 0 | | > > 32 -> 63 : 0 | | > > 64 -> 127 : 0 | | > > 128 -> 255 : 0 | | > > 256 -> 511 : 0 | | > > 512 -> 1023 : 1 | | > > 1024 -> 2047 : 26665 |*********** | > > 2048 -> 4095 : 93834 |****************************************| > > 4096 -> 8191 : 2695 |* | > > 8192 -> 16383 : 268 | | > > 16384 -> 32767 : 24 | | > > 32768 -> 65535 : 2 | | > > > > avg = 2475 nsecs, total: 305745369 nsecs, count: 123489 > > > > > > > > I would like to be more sure that klp_try_complete_transition() really > > > could block RCU for that long. I would like to confirm that > > > the following is the reality: > > > > > > num_processes * average_klp_try_switch_task > 10second > > > > 9100 * 2.475 / 1000 is around 22 second > > It looks like I misread the nsec as usec, so the total average > duration is actually around 22ms. ;) > I’ll double-check it to confirm. I've confirmed that during RCU stalls, the average_klp_try_switch_task value isn't unusually large. However, the real issue lies in the extended duration of the klp_try_complete_transition(). - The RCU stall [Fri Feb 14 10:14:56 2025] livepatch: enabling patch 'livepatch_61_release6' [Fri Feb 14 10:14:56 2025] livepatch: 'livepatch_61_release6': starting patching transition [Fri Feb 14 10:15:10 2025] rcu_tasks_wait_gp: rcu_tasks grace period 32001 is 10073 jiffies old. [Fri Feb 14 10:15:14 2025] livepatch: 'livepatch_61_release6': patching complete - klp_try_switch_task 10:17:50 nsecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 2 | | 256 -> 511 : 5 | | 512 -> 1023 : 3 | | 1024 -> 2047 : 806 |* | 2048 -> 4095 : 30382 |****************************************| 4096 -> 8191 : 10806 |************** | 8192 -> 16383 : 65 | | 16384 -> 32767 : 9 | | avg = 3562 nsecs, total: 149912461 nsecs, count: 42078 The avg of klp_try_switch_task is only 3.5us. - number of threads Fri Feb 14 10:14:48 12106 Fri Feb 14 10:14:59 11900 Fri Feb 14 10:15:10 12004 Fri Feb 14 10:15:20 11899 Fri Feb 14 10:15:31 12293 Fri Feb 14 10:15:42 11831 At that point, the system has around 12,000 threads, so the total duration comes out to approximately 12,000 * 3.5µs, or about 42ms—well below the 10s threshold. However, the duration of klp_try_complete_transition() exceeds 10 seconds. 10:15:41 nsecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 0 | | 8192 -> 16383 : 0 | | 16384 -> 32767 : 0 | | 32768 -> 65535 : 0 | | 65536 -> 131071 : 0 | | 131072 -> 262143 : 0 | | 262144 -> 524287 : 0 | | 524288 -> 1048575 : 0 | | 1048576 -> 2097151 : 0 | | 2097152 -> 4194303 : 0 | | 4194304 -> 8388607 : 0 | | 8388608 -> 16777215 : 0 | | 16777216 -> 33554431 : 0 | | 33554432 -> 67108863 : 1 |********************| 67108864 -> 134217727 : 1 |********************| 134217728 -> 268435455 : 0 | | 268435456 -> 536870911 : 0 | | 536870912 -> 1073741823 : 0 | | 1073741824 -> 2147483647 : 0 | | 2147483648 -> 4294967295 : 0 | | 4294967296 -> 8589934591 : 1 |********************| 8589934592 -> 17179869183 : 1 |********************| avg = 5630258522 nsecs, total: 22521034091 nsecs, count: 4 The longest duration of klp_try_complete_transition() ranges from 8.5 to 17.2 seconds. It appears that the RCU stall is not only driven by num_processes * average_klp_try_switch_task, but also by contention within klp_try_complete_transition(), particularly around the tasklist_lock. Interestingly, even after replacing "read_lock(&tasklist_lock)" with "rcu_read_lock()", the RCU stall persists. My verification shows that the only way to prevent the stall is by checking need_resched() during each iteration of the loop. -- Regards Yafang