The patch titled Subject: watchdog/core: fix AA deadlock due to watchdog holding cpu_hotplug_lock and wait for wq has been added to the -mm mm-nonmm-unstable branch. Its filename is watchdog-core-fix-aa-deadlock-due-to-watchdog-holding-cpu_hotplug_lock-and-wait-for-wq.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/watchdog-core-fix-aa-deadlock-due-to-watchdog-holding-cpu_hotplug_lock-and-wait-for-wq.patch This patch will later appear in the mm-nonmm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Luo Gengkun <luogengkun@xxxxxxxxxxxxxxx> Subject: watchdog/core: fix AA deadlock due to watchdog holding cpu_hotplug_lock and wait for wq Date: Thu, 6 Jun 2024 15:38:28 +0000 We found an AA deadlock problem as shown belowed: TaskA TaskB WatchDog system_wq ... css_killed_work_fn: P(cgroup_mutex) ... ... __lockup_detector_reconfigure: P(cpu_hotplug_lock.read) ... ... cpu_up: percpu_down_write: P(cpu_hotplug_lock.write) ... cgroup_bpf_release: P(cgroup_mutex) smp_call_on_cpu: Wait system_wq cpuset_css_offline: P(cpu_hotplug_lock.read) WatchDog is waiting for system_wq, who is waitting for cgroup_mutex, to finish the jobs, but the owner of the cgroup_mutex is waitting for cpu_hotplug_lock. The key point is the cpu_hotplug_lock, cause the system_wq may be waitting other lock. It seems unhealthy to hold a lock when waitting system_wq, because we never know what jobs are system_wq doing. So I fix this by replace cpu_read_lock/unlock with cpu_hotplug_disable/enable to prevent cpu offline/online. Link: https://lkml.kernel.org/r/20240606153828.3261006-1-luogengkun@xxxxxxxxxxxxxxx Fixes: e31d6883f21c ("watchdog/core, powerpc: Lock cpus across reconfiguration") Signed-off-by: Luo Gengkun <luogengkun@xxxxxxxxxxxxxxx> Cc: Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx> Cc: Christophe Leroy <christophe.leroy@xxxxxxxxxx> Cc: Douglas Anderson <dianders@xxxxxxxxxxxx> Cc: Lecopzer Chen <lecopzer.chen@xxxxxxxxxxxx> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> (powerpc) Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Naveen N. Rao <naveen.n.rao@xxxxxxxxxxxxx> Cc: Nicholas Piggin <npiggin@xxxxxxxxx> Cc: Petr Mladek <pmladek@xxxxxxxx> Cc: Pingfan Liu <kernelfans@xxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Tom Rix <trix@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- kernel/watchdog.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) --- a/kernel/watchdog.c~watchdog-core-fix-aa-deadlock-due-to-watchdog-holding-cpu_hotplug_lock-and-wait-for-wq +++ a/kernel/watchdog.c @@ -867,7 +867,7 @@ int lockup_detector_offline_cpu(unsigned static void __lockup_detector_reconfigure(void) { - cpus_read_lock(); + cpu_hotplug_disable(); watchdog_hardlockup_stop(); softlockup_stop_all(); @@ -877,7 +877,7 @@ static void __lockup_detector_reconfigur softlockup_start_all(); watchdog_hardlockup_start(); - cpus_read_unlock(); + cpu_hotplug_enable(); /* * Must be called outside the cpus locked section to prevent * recursive locking in the perf code. @@ -916,11 +916,11 @@ static __init void lockup_detector_setup #else /* CONFIG_SOFTLOCKUP_DETECTOR */ static void __lockup_detector_reconfigure(void) { - cpus_read_lock(); + cpu_hotplug_disable(); watchdog_hardlockup_stop(); lockup_detector_update_enable(); watchdog_hardlockup_start(); - cpus_read_unlock(); + cpu_hotplug_enable(); } void lockup_detector_reconfigure(void) { _ Patches currently in -mm which might be from luogengkun@xxxxxxxxxxxxxxx are watchdog-core-fix-aa-deadlock-due-to-watchdog-holding-cpu_hotplug_lock-and-wait-for-wq.patch