[PATCH] livepatch: Avoid hard lockup caused by klp_try_switch_task()

Yafang Shao <laoar.shao@xxxxxxxxx> · Wed, 22 Jan 2025 16:51:46 +0800

I encountered a hard lockup while attempting to reproduce the panic issue
that occurred on our production servers [0]. The hard lockup manifests as
follows:

[15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
[15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
[15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
[15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
[15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
[15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
[15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
[15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
[15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
[15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
[15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
[15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
[15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
[15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
[15852778.412319] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
...
[15852778.412374] CPU: 15 PID: 1 Comm: systemd Kdump: loaded Tainted: G S      W  O  K    6.1.52-3
[15852778.412377] Hardware name: New H3C Technologies Co., Ltd. H3C UniServer R4950 G5/RS45M2C9S, BIOS 5.12 10/15/2021
[15852778.412378] RIP: 0010:queued_write_lock_slowpath+0x75/0x135
...
[15852778.412397] Call Trace:
[15852778.412398]  <NMI>
[15852778.412400]  ? show_regs.cold+0x1a/0x1f
[15852778.412403]  ? watchdog_overflow_callback.cold+0x1e/0x70
[15852778.412406]  ? __perf_event_overflow+0x102/0x1e0
[15852778.412409]  ? perf_event_overflow+0x19/0x20
[15852778.412411]  ? x86_pmu_handle_irq+0xf7/0x160
[15852778.412415]  ? flush_tlb_one_kernel+0xe/0x30
[15852778.412418]  ? __set_pte_vaddr+0x2d/0x40
[15852778.412421]  ? set_pte_vaddr_p4d+0x3d/0x50
[15852778.412423]  ? set_pte_vaddr+0x6d/0xa0
[15852778.412424]  ? __native_set_fixmap+0x28/0x40
[15852778.412426]  ? native_set_fixmap+0x54/0x60
[15852778.412428]  ? ghes_copy_tofrom_phys+0x75/0x120
[15852778.412431]  ? __ghes_peek_estatus.isra.0+0x4e/0xb0
[15852778.412434]  ? ghes_in_nmi_queue_one_entry.constprop.0+0x3d/0x240
[15852778.412437]  ? amd_pmu_handle_irq+0x48/0xc0
[15852778.412438]  ? perf_event_nmi_handler+0x2d/0x50
[15852778.412440]  ? nmi_handle+0x60/0x120
[15852778.412443]  ? default_do_nmi+0x45/0x120
[15852778.412446]  ? exc_nmi+0x118/0x150
[15852778.412447]  ? end_repeat_nmi+0x16/0x67
[15852778.412450]  ? copy_process+0xf01/0x19f0
[15852778.412452]  ? queued_write_lock_slowpath+0x75/0x135
[15852778.412455]  ? queued_write_lock_slowpath+0x75/0x135
[15852778.412457]  ? queued_write_lock_slowpath+0x75/0x135
[15852778.412459]  </NMI>
[15852778.412460]  <TASK>
[15852778.412461]  _raw_write_lock_irq+0x43/0x50
[15852778.412463]  copy_process+0xf01/0x19f0
[15852778.412466]  kernel_clone+0x9d/0x3e0
[15852778.412468]  ? autofs_dev_ioctl_requester+0x100/0x100
[15852778.412471]  __do_sys_clone+0x66/0x90
[15852778.412475]  __x64_sys_clone+0x25/0x30
[15852778.412477]  do_syscall_64+0x38/0x90
[15852778.412478]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[15852778.412481] RIP: 0033:0x7f426bb3b9c1
...

Notably, dynamic_debug is enabled to collect debug information when
applying a livepatch, resulting in a large amount of debug output.

The issue arises because klp_try_switch_task() holds the tasklist_lock, and
if another task attempts to acquire it, it must spin until it's available.
This becomes problematic in the copy_process() path, where IRQs are
disabled, leading to the hard lockup. To prevent this, we should implement
a check for spinlock contention before proceeding.

Link: https://lore.kernel.org/live-patching/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@xxxxxxxxxxxxxx/ [0]
Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
---
 kernel/livepatch/transition.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index ba069459c101..774017825bb4 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -467,9 +467,14 @@ void klp_try_complete_transition(void)
 	 * unless the patch includes changes to a very common function.
 	 */
 	read_lock(&tasklist_lock);
-	for_each_process_thread(g, task)
+	for_each_process_thread(g, task) {
 		if (!klp_try_switch_task(task))
 			complete = false;
+		if (rwlock_is_contended(&tasklist_lock) || need_resched()) {
+			complete = false;
+			break;
+		}
+	}
 	read_unlock(&tasklist_lock);
 
 	/*
-- 
2.43.5