Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

Yafang Shao <laoar.shao@xxxxxxxxx> · Thu, 13 Feb 2025 20:39:14 +0800

On Thu, Feb 13, 2025 at 8:32 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <pmladek@xxxxxxxx> wrote:
> >
> > On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> > > I encountered a hard lockup when attempting to reproduce the panic issue
> > > occurred on our production servers [0]. The hard lockup is as follows,
> > >
> > > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> > > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> > > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> > > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> > > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> > > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> > > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> > > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> > > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> > > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> > > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> > > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> > > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> > > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
> >
> > This message does not exist in vanilla kernel. It looks like an extra
> > debug message. And many extra messages might create stalls on its own.
>
> Right, the dynamic_debug is enabled:
>
>   $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control
>
> >
> > AFAIK, your reproduced the problem even without these extra messages.
>
> There are two issues during the KLP transition:
> 1. RCU warnings
> 2. hard lockup
>
> RCU stalls can be easily reproduced without the extra messages.
> However, hard lockups cannot be reproduced unless dynamic debugging is
> enabled.
>
> > At least, I see the following at
> > https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@xxxxxxxxxxxxxx
>
> That's correct, this is related to the RCU warnings issue.
>
> >
> > <paste>
> > [20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
> > [20329703.340417] livepatch: 'livepatch_61_release6': starting
> > patching transition
> > [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
> > 10166 jiffies old.
> > [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
> > 10219 jiffies old.
> > [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
> > 10199 jiffies old.
> > [20329754.848036] livepatch: 'livepatch_61_release6': patching complete
> > </paste>
> >
> > Could you please confirm that this the original _non-filtered_ log?
>
> Right.
>
> > I mean that the debug messages were _not_ printed and later filtered?
>
> Right.
>
> >
> > I would like to know more about the system where RCU reported the
> > stall. How many processes are running there in average?
> > A rough number is enough. I wonder if it is about 1000, 10000, or
> > 50000?
>
> Most of the servers have between 5,000 and 10,000 threads.
>
> >
> > Also what is the CONFIG_HZ value, please?
>
> CONFIG_HZ_PERIODIC=y
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
>
> >
> > Also we should get some statistics how long klp_try_switch_task()
> > lasts in average. I have never did it but I guess that
> > it should be rather easy with perf. Or maybe just by looking
> > at function_graph trace.
>
> Currently, on one of my production servers, CPU usage is around 40%,
> and the number of threads is approximately 9,100. The livepatch is
> applied every 5 seconds. We can adjust the frequency, but the
> difference won't be significant, as RCU stalls are easy to reproduce
> even at very low frequencies.
> The duration of klp_try_switch_task() is as follows:
>
> $ /usr/share/bcc/tools/funclatency klp_try_switch_task.part.0 -d 60
> Tracing 1 functions for "klp_try_switch_task.part.0"... Hit Ctrl-C to end.
>      nsecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 0        |                                        |
>         16 -> 31         : 0        |                                        |
>         32 -> 63         : 0        |                                        |
>         64 -> 127        : 0        |                                        |
>        128 -> 255        : 0        |                                        |
>        256 -> 511        : 0        |                                        |
>        512 -> 1023       : 1        |                                        |
>       1024 -> 2047       : 26665    |***********                             |
>       2048 -> 4095       : 93834    |****************************************|
>       4096 -> 8191       : 2695     |*                                       |
>       8192 -> 16383      : 268      |                                        |
>      16384 -> 32767      : 24       |                                        |
>      32768 -> 65535      : 2        |                                        |
>
> avg = 2475 nsecs, total: 305745369 nsecs, count: 123489
>
> >
> > I would like to be more sure that klp_try_complete_transition() really
> > could block RCU for that long. I would like to confirm that
> > the following is the reality:
> >
> >   num_processes * average_klp_try_switch_task > 10second
>
> 9100 * 2.475 / 1000 is around 22 second

It looks like I misread the nsec as usec, so the total average
duration is actually around 22ms. ;)
I’ll double-check it to confirm.

>
> >
> > If it is true than we really need to break the cycle after some
> > timeout. And rcu_read_lock() is _not_ a solution because it would
> > block RCU the same way.
> >
> > Does it make sense, please?
>
> Makes sense.
> I hope this clarifies things.
>
> --
> Regards
> Yafang

-- 
Regards
Yafang