Re: [PATCH] livepatch: Avoid hard lockup caused by klp_try_switch_task()

Petr Mladek <pmladek@xxxxxxxx> · Wed, 22 Jan 2025 13:50:48 +0100

On Wed 2025-01-22 16:51:46, Yafang Shao wrote:
> I encountered a hard lockup while attempting to reproduce the panic issue
> that occurred on our production servers [0]. The hard lockup manifests as
> follows:
> 
> [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
> [15852778.412319] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
> ...
> [15852778.412374] CPU: 15 PID: 1 Comm: systemd Kdump: loaded Tainted: G S      W  O  K    6.1.52-3
> [15852778.412377] Hardware name: New H3C Technologies Co., Ltd. H3C UniServer R4950 G5/RS45M2C9S, BIOS 5.12 10/15/2021
> [15852778.412378] RIP: 0010:queued_write_lock_slowpath+0x75/0x135
> ...
> [15852778.412397] Call Trace:
> [15852778.412398]  <NMI>
> [15852778.412400]  ? show_regs.cold+0x1a/0x1f
> [15852778.412403]  ? watchdog_overflow_callback.cold+0x1e/0x70
> [15852778.412406]  ? __perf_event_overflow+0x102/0x1e0
> [15852778.412409]  ? perf_event_overflow+0x19/0x20
> [15852778.412411]  ? x86_pmu_handle_irq+0xf7/0x160
> [15852778.412415]  ? flush_tlb_one_kernel+0xe/0x30
> [15852778.412418]  ? __set_pte_vaddr+0x2d/0x40
> [15852778.412421]  ? set_pte_vaddr_p4d+0x3d/0x50
> [15852778.412423]  ? set_pte_vaddr+0x6d/0xa0
> [15852778.412424]  ? __native_set_fixmap+0x28/0x40
> [15852778.412426]  ? native_set_fixmap+0x54/0x60
> [15852778.412428]  ? ghes_copy_tofrom_phys+0x75/0x120
> [15852778.412431]  ? __ghes_peek_estatus.isra.0+0x4e/0xb0
> [15852778.412434]  ? ghes_in_nmi_queue_one_entry.constprop.0+0x3d/0x240
> [15852778.412437]  ? amd_pmu_handle_irq+0x48/0xc0
> [15852778.412438]  ? perf_event_nmi_handler+0x2d/0x50
> [15852778.412440]  ? nmi_handle+0x60/0x120
> [15852778.412443]  ? default_do_nmi+0x45/0x120
> [15852778.412446]  ? exc_nmi+0x118/0x150
> [15852778.412447]  ? end_repeat_nmi+0x16/0x67
> [15852778.412450]  ? copy_process+0xf01/0x19f0
> [15852778.412452]  ? queued_write_lock_slowpath+0x75/0x135
> [15852778.412455]  ? queued_write_lock_slowpath+0x75/0x135
> [15852778.412457]  ? queued_write_lock_slowpath+0x75/0x135
> [15852778.412459]  </NMI>
> [15852778.412460]  <TASK>
> [15852778.412461]  _raw_write_lock_irq+0x43/0x50
> [15852778.412463]  copy_process+0xf01/0x19f0
> [15852778.412466]  kernel_clone+0x9d/0x3e0
> [15852778.412468]  ? autofs_dev_ioctl_requester+0x100/0x100
> [15852778.412471]  __do_sys_clone+0x66/0x90
> [15852778.412475]  __x64_sys_clone+0x25/0x30
> [15852778.412477]  do_syscall_64+0x38/0x90
> [15852778.412478]  entry_SYSCALL_64_after_hwframe+0x64/0xce
> [15852778.412481] RIP: 0033:0x7f426bb3b9c1
> ...
> 
> Notably, dynamic_debug is enabled to collect debug information when
> applying a livepatch, resulting in a large amount of debug output.
> 
> The issue arises because klp_try_switch_task() holds the tasklist_lock, and
> if another task attempts to acquire it, it must spin until it's available.
> This becomes problematic in the copy_process() path, where IRQs are
> disabled, leading to the hard lockup. To prevent this, we should implement
> a check for spinlock contention before proceeding.
> 
> Link: https://lore.kernel.org/live-patching/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@xxxxxxxxxxxxxx/ [0]
> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> ---
>  kernel/livepatch/transition.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index ba069459c101..774017825bb4 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -467,9 +467,14 @@ void klp_try_complete_transition(void)
>  	 * unless the patch includes changes to a very common function.
>  	 */
>  	read_lock(&tasklist_lock);
> -	for_each_process_thread(g, task)
> +	for_each_process_thread(g, task) {
>  		if (!klp_try_switch_task(task))
>  			complete = false;
> +		if (rwlock_is_contended(&tasklist_lock) || need_resched()) {

Are you able to finish the livepatch transition with this patch?

> +			complete = false;
> +			break;
> +		}
> +	}
>  	read_unlock(&tasklist_lock);
>  
>  	/*

With this patch, any operation which takes the tasklist_lock might
break klp_try_complete_transition(). I am afraid that this might
block the transition for a long time on huge systems with some
specific loads.

And the problem is caused by a printk() added just for debugging.
I wonder if you even use a slow serial port.

You might try to use printk_deferred() instead. Also you might need
to disable interrupts around the read_lock()/read_unlock() to
make sure that the console handling will be deferred after
the tasklist_lock gets released.

Anyway, I am against this patch.

Best Regards,
Petr