Re: [PATCH] livepatch: Avoid hard lockup caused by klp_try_switch_task()

Yafang Shao <laoar.shao@xxxxxxxxx> · Wed, 22 Jan 2025 21:46:52 +0800

On Wed, Jan 22, 2025 at 8:50 PM Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> On Wed 2025-01-22 16:51:46, Yafang Shao wrote:
> > I encountered a hard lockup while attempting to reproduce the panic issue
> > that occurred on our production servers [0]. The hard lockup manifests as
> > follows:
> >
> > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
> > [15852778.412319] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
> > ...
> > [15852778.412374] CPU: 15 PID: 1 Comm: systemd Kdump: loaded Tainted: G S      W  O  K    6.1.52-3
> > [15852778.412377] Hardware name: New H3C Technologies Co., Ltd. H3C UniServer R4950 G5/RS45M2C9S, BIOS 5.12 10/15/2021
> > [15852778.412378] RIP: 0010:queued_write_lock_slowpath+0x75/0x135
> > ...
> > [15852778.412397] Call Trace:
> > [15852778.412398]  <NMI>
> > [15852778.412400]  ? show_regs.cold+0x1a/0x1f
> > [15852778.412403]  ? watchdog_overflow_callback.cold+0x1e/0x70
> > [15852778.412406]  ? __perf_event_overflow+0x102/0x1e0
> > [15852778.412409]  ? perf_event_overflow+0x19/0x20
> > [15852778.412411]  ? x86_pmu_handle_irq+0xf7/0x160
> > [15852778.412415]  ? flush_tlb_one_kernel+0xe/0x30
> > [15852778.412418]  ? __set_pte_vaddr+0x2d/0x40
> > [15852778.412421]  ? set_pte_vaddr_p4d+0x3d/0x50
> > [15852778.412423]  ? set_pte_vaddr+0x6d/0xa0
> > [15852778.412424]  ? __native_set_fixmap+0x28/0x40
> > [15852778.412426]  ? native_set_fixmap+0x54/0x60
> > [15852778.412428]  ? ghes_copy_tofrom_phys+0x75/0x120
> > [15852778.412431]  ? __ghes_peek_estatus.isra.0+0x4e/0xb0
> > [15852778.412434]  ? ghes_in_nmi_queue_one_entry.constprop.0+0x3d/0x240
> > [15852778.412437]  ? amd_pmu_handle_irq+0x48/0xc0
> > [15852778.412438]  ? perf_event_nmi_handler+0x2d/0x50
> > [15852778.412440]  ? nmi_handle+0x60/0x120
> > [15852778.412443]  ? default_do_nmi+0x45/0x120
> > [15852778.412446]  ? exc_nmi+0x118/0x150
> > [15852778.412447]  ? end_repeat_nmi+0x16/0x67
> > [15852778.412450]  ? copy_process+0xf01/0x19f0
> > [15852778.412452]  ? queued_write_lock_slowpath+0x75/0x135
> > [15852778.412455]  ? queued_write_lock_slowpath+0x75/0x135
> > [15852778.412457]  ? queued_write_lock_slowpath+0x75/0x135
> > [15852778.412459]  </NMI>
> > [15852778.412460]  <TASK>
> > [15852778.412461]  _raw_write_lock_irq+0x43/0x50
> > [15852778.412463]  copy_process+0xf01/0x19f0
> > [15852778.412466]  kernel_clone+0x9d/0x3e0
> > [15852778.412468]  ? autofs_dev_ioctl_requester+0x100/0x100
> > [15852778.412471]  __do_sys_clone+0x66/0x90
> > [15852778.412475]  __x64_sys_clone+0x25/0x30
> > [15852778.412477]  do_syscall_64+0x38/0x90
> > [15852778.412478]  entry_SYSCALL_64_after_hwframe+0x64/0xce
> > [15852778.412481] RIP: 0033:0x7f426bb3b9c1
> > ...
> >
> > Notably, dynamic_debug is enabled to collect debug information when
> > applying a livepatch, resulting in a large amount of debug output.
> >
> > The issue arises because klp_try_switch_task() holds the tasklist_lock, and
> > if another task attempts to acquire it, it must spin until it's available.
> > This becomes problematic in the copy_process() path, where IRQs are
> > disabled, leading to the hard lockup. To prevent this, we should implement
> > a check for spinlock contention before proceeding.
> >
> > Link: https://lore.kernel.org/live-patching/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@xxxxxxxxxxxxxx/ [0]
> > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> > ---
> >  kernel/livepatch/transition.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > index ba069459c101..774017825bb4 100644
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -467,9 +467,14 @@ void klp_try_complete_transition(void)
> >        * unless the patch includes changes to a very common function.
> >        */
> >       read_lock(&tasklist_lock);
> > -     for_each_process_thread(g, task)
> > +     for_each_process_thread(g, task) {
> >               if (!klp_try_switch_task(task))
> >                       complete = false;
> > +             if (rwlock_is_contended(&tasklist_lock) || need_resched()) {
>
> Are you able to finish the livepatch transition with this patch?

Yes, I've deployed this change to a few test servers, and the
livepatch transition is still functioning correctly on them. These
servers are running the same workload which triggered the panic.

>
> > +                     complete = false;
> > +                     break;
> > +             }
> > +     }
> >       read_unlock(&tasklist_lock);
> >
> >       /*
>
> With this patch, any operation which takes the tasklist_lock might
> break klp_try_complete_transition(). I am afraid that this might
> block the transition for a long time on huge systems with some
> specific loads.
>
> And the problem is caused by a printk() added just for debugging.
> I wonder if you even use a slow serial port.

No, we don't use a slow console.

The console is :

$ cat /proc/cmdline
... console=tty0 ...

The log start from :
[15852771.495313] livepatch: 'livepatch_61_release6': starting
patching transition
[15852771.500341] livepatch: klp_try_switch_task: kcompactd0:664 is running
[15852771.516951] livepatch: klp_try_switch_task: java:2236329 is running
[15852771.522891] livepatch: klp_try_switch_task: python:338825 is
sleeping on function do_exit
[15852771.526292] livepatch: klp_try_switch_task:
jemalloc_bg_thd:338826 is sleeping on function do_exit
....

And end with:

[15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894
is sleeping on function do_exit
[15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976
is sleeping on function do_exit
[15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977
is sleeping on function do_exit
[15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610
is sleeping on function do_exit
[15852778.412319] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15

$ cat log | grep do_exit | wc -l
1061

It seems that there are simply too many threads executing do_exit() at
the moment.

>
> You might try to use printk_deferred() instead. Also you might need
> to disable interrupts around the read_lock()/read_unlock() to
> make sure that the console handling will be deferred after
> the tasklist_lock gets released.
>
> Anyway, I am against this patch.

However, there is still a risk of triggering a hard lockup if a large
number of tasks are involved.

--
Regards
Yafang