Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

Yafang Shao <laoar.shao@xxxxxxxxx> · Thu, 13 Feb 2025 20:32:19 +0800

On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> > I encountered a hard lockup when attempting to reproduce the panic issue
> > occurred on our production servers [0]. The hard lockup is as follows,
> >
> > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
>
> This message does not exist in vanilla kernel. It looks like an extra
> debug message. And many extra messages might create stalls on its own.

Right, the dynamic_debug is enabled:

  $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control

>
> AFAIK, your reproduced the problem even without these extra messages.

There are two issues during the KLP transition:
1. RCU warnings
2. hard lockup

RCU stalls can be easily reproduced without the extra messages.
However, hard lockups cannot be reproduced unless dynamic debugging is
enabled.

> At least, I see the following at
> https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@xxxxxxxxxxxxxx

That's correct, this is related to the RCU warnings issue.

>
> <paste>
> [20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
> [20329703.340417] livepatch: 'livepatch_61_release6': starting
> patching transition
> [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
> 10166 jiffies old.
> [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
> 10219 jiffies old.
> [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
> 10199 jiffies old.
> [20329754.848036] livepatch: 'livepatch_61_release6': patching complete
> </paste>
>
> Could you please confirm that this the original _non-filtered_ log?

Right.

> I mean that the debug messages were _not_ printed and later filtered?

Right.

>
> I would like to know more about the system where RCU reported the
> stall. How many processes are running there in average?
> A rough number is enough. I wonder if it is about 1000, 10000, or
> 50000?

Most of the servers have between 5,000 and 10,000 threads.

>
> Also what is the CONFIG_HZ value, please?

CONFIG_HZ_PERIODIC=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000

>
> Also we should get some statistics how long klp_try_switch_task()
> lasts in average. I have never did it but I guess that
> it should be rather easy with perf. Or maybe just by looking
> at function_graph trace.

Currently, on one of my production servers, CPU usage is around 40%,
and the number of threads is approximately 9,100. The livepatch is
applied every 5 seconds. We can adjust the frequency, but the
difference won't be significant, as RCU stalls are easy to reproduce
even at very low frequencies.
The duration of klp_try_switch_task() is as follows:

$ /usr/share/bcc/tools/funclatency klp_try_switch_task.part.0 -d 60
Tracing 1 functions for "klp_try_switch_task.part.0"... Hit Ctrl-C to end.
     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 1        |                                        |
      1024 -> 2047       : 26665    |***********                             |
      2048 -> 4095       : 93834    |****************************************|
      4096 -> 8191       : 2695     |*                                       |
      8192 -> 16383      : 268      |                                        |
     16384 -> 32767      : 24       |                                        |
     32768 -> 65535      : 2        |                                        |

avg = 2475 nsecs, total: 305745369 nsecs, count: 123489

>
> I would like to be more sure that klp_try_complete_transition() really
> could block RCU for that long. I would like to confirm that
> the following is the reality:
>
>   num_processes * average_klp_try_switch_task > 10second

9100 * 2.475 / 1000 is around 22 second

>
> If it is true than we really need to break the cycle after some
> timeout. And rcu_read_lock() is _not_ a solution because it would
> block RCU the same way.
>
> Does it make sense, please?

Makes sense.
I hope this clarifies things.

-- 
Regards
Yafang