On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <pmladek@xxxxxxxx> wrote: > > On Tue 2025-02-11 14:24:36, Yafang Shao wrote: > > I encountered a hard lockup when attempting to reproduce the panic issue > > occurred on our production servers [0]. The hard lockup is as follows, > > > > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit > > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit > > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit > > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit > > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit > > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit > > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit > > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit > > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit > > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit > > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit > > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit > > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit > > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit > > This message does not exist in vanilla kernel. It looks like an extra > debug message. And many extra messages might create stalls on its own. Right, the dynamic_debug is enabled: $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control > > AFAIK, your reproduced the problem even without these extra messages. There are two issues during the KLP transition: 1. RCU warnings 2. hard lockup RCU stalls can be easily reproduced without the extra messages. However, hard lockups cannot be reproduced unless dynamic debugging is enabled. > At least, I see the following at > https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@xxxxxxxxxxxxxx That's correct, this is related to the RCU warnings issue. > > <paste> > [20329703.332453] livepatch: enabling patch 'livepatch_61_release6' > [20329703.340417] livepatch: 'livepatch_61_release6': starting > patching transition > [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is > 10166 jiffies old. > [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is > 10219 jiffies old. > [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is > 10199 jiffies old. > [20329754.848036] livepatch: 'livepatch_61_release6': patching complete > </paste> > > Could you please confirm that this the original _non-filtered_ log? Right. > I mean that the debug messages were _not_ printed and later filtered? Right. > > I would like to know more about the system where RCU reported the > stall. How many processes are running there in average? > A rough number is enough. I wonder if it is about 1000, 10000, or > 50000? Most of the servers have between 5,000 and 10,000 threads. > > Also what is the CONFIG_HZ value, please? CONFIG_HZ_PERIODIC=y CONFIG_HZ_1000=y CONFIG_HZ=1000 > > Also we should get some statistics how long klp_try_switch_task() > lasts in average. I have never did it but I guess that > it should be rather easy with perf. Or maybe just by looking > at function_graph trace. Currently, on one of my production servers, CPU usage is around 40%, and the number of threads is approximately 9,100. The livepatch is applied every 5 seconds. We can adjust the frequency, but the difference won't be significant, as RCU stalls are easy to reproduce even at very low frequencies. The duration of klp_try_switch_task() is as follows: $ /usr/share/bcc/tools/funclatency klp_try_switch_task.part.0 -d 60 Tracing 1 functions for "klp_try_switch_task.part.0"... Hit Ctrl-C to end. nsecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 1 | | 1024 -> 2047 : 26665 |*********** | 2048 -> 4095 : 93834 |****************************************| 4096 -> 8191 : 2695 |* | 8192 -> 16383 : 268 | | 16384 -> 32767 : 24 | | 32768 -> 65535 : 2 | | avg = 2475 nsecs, total: 305745369 nsecs, count: 123489 > > I would like to be more sure that klp_try_complete_transition() really > could block RCU for that long. I would like to confirm that > the following is the reality: > > num_processes * average_klp_try_switch_task > 10second 9100 * 2.475 / 1000 is around 22 second > > If it is true than we really need to break the cycle after some > timeout. And rcu_read_lock() is _not_ a solution because it would > block RCU the same way. > > Does it make sense, please? Makes sense. I hope this clarifies things. -- Regards Yafang