Hello, kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on: commit: d362c5c67bb96ccdc4dd34a781d23348d927392d ("[PATCH] watchdog/core: Fix AA deadlock due to watchdog holding cpu_hotplug_lock and wait for wq") url: https://github.com/intel-lab-lkp/linux/commits/Luo-Gengkun/watchdog-core-Fix-AA-deadlock-due-to-watchdog-holding-cpu_hotplug_lock-and-wait-for-wq/20240606-233305 base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/all/20240606153828.3261006-1-luogengkun@xxxxxxxxxxxxxxx/ patch subject: [PATCH] watchdog/core: Fix AA deadlock due to watchdog holding cpu_hotplug_lock and wait for wq in testcase: rcutorture version: with following parameters: runtime: 300s test: cpuhotplug torture_type: busted compiler: clang-18 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202406111537.dd9d27e9-lkp@xxxxxxxxx [ 87.506482][ T9] WARNING: possible circular locking dependency detected [ 87.506854][ T9] 6.10.0-rc1-00236-gd362c5c67bb9 #1 Not tainted [ 87.507186][ T9] ------------------------------------------------------ [ 87.507554][ T9] kworker/0:1/9 is trying to acquire lock: [ 87.507861][ T9] ffffffff84305f90 (watchdog_mutex){+.+.}-{3:3}, at: lockup_detector_cleanup (kernel/watchdog.c:937) [ 87.509166][ T9] [ 87.509166][ T9] but task is already holding lock: [ 87.509550][ T9] ffffc9000009fd58 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works (kernel/workqueue.c:3207) [ 87.510129][ T9] [ 87.510129][ T9] which lock already depends on the new lock. [ 87.510129][ T9] [ 87.510660][ T9] [ 87.510660][ T9] the existing dependency chain (in reverse order) is: [ 87.511125][ T9] [ 87.511125][ T9] -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}: [ 87.511584][ T9] __flush_work (kernel/workqueue.c:3894) [ 87.511849][ T9] work_on_cpu_key (kernel/workqueue.c:683 kernel/workqueue.c:6693) [ 87.512120][ T9] cpu_down (kernel/cpu.c:1487) [ 87.512358][ T9] device_offline (drivers/base/core.c:?) [ 87.512631][ T9] remove_cpu (kernel/cpu.c:1522) [ 87.512876][ T9] torture_offline (??:?) torture [ 87.513217][ T9] torture_onoff (??:?) torture [ 87.513535][ T9] kthread (kernel/kthread.c:391) [ 87.513777][ T9] ret_from_fork (arch/x86/kernel/process.c:153) [ 87.514035][ T9] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 87.514311][ T9] [ 87.514311][ T9] -> #1 (cpu_add_remove_lock){+.+.}-{3:3}: [ 87.514727][ T9] __mutex_lock (kernel/locking/mutex.c:608) [ 87.514986][ T9] cpu_hotplug_disable (kernel/cpu.c:555) [ 87.515271][ T9] __lockup_detector_reconfigure (kernel/watchdog.c:871) [ 87.515599][ T9] lockup_detector_setup (kernel/watchdog.c:912) [ 87.515914][ T9] kernel_init_freeable (init/main.c:1570) [ 87.516213][ T9] kernel_init (init/main.c:1469) [ 87.516467][ T9] ret_from_fork (arch/x86/kernel/process.c:153) [ 87.516727][ T9] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 87.517002][ T9] [ 87.517002][ T9] -> #0 (watchdog_mutex){+.+.}-{3:3}: [ 87.517415][ T9] __lock_acquire (kernel/locking/lockdep.c:3135) [ 87.517695][ T9] lock_acquire (kernel/locking/lockdep.c:5754) [ 87.517957][ T9] __mutex_lock (kernel/locking/mutex.c:608) [ 87.518215][ T9] lockup_detector_cleanup (kernel/watchdog.c:937) [ 87.518518][ T9] _cpu_down (kernel/cpu.c:1450) [ 87.518768][ T9] __cpu_down_maps_locked (kernel/cpu.c:1463) [ 87.519065][ T9] work_for_cpu_fn (kernel/workqueue.c:6670) [ 87.519333][ T9] process_scheduled_works (kernel/workqueue.c:?) [ 87.519648][ T9] worker_thread (include/linux/list.h:373 kernel/workqueue.c:946 kernel/workqueue.c:3394) [ 87.519915][ T9] kthread (kernel/kthread.c:391) [ 87.520157][ T9] ret_from_fork (arch/x86/kernel/process.c:153) [ 87.520415][ T9] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 87.520690][ T9] [ 87.520690][ T9] other info that might help us debug this: [ 87.520690][ T9] [ 87.521221][ T9] Chain exists of: [ 87.521221][ T9] watchdog_mutex --> cpu_add_remove_lock --> (work_completion)(&wfc.work) [ 87.521221][ T9] [ 87.521963][ T9] Possible unsafe locking scenario: [ 87.521963][ T9] [ 87.522347][ T9] CPU0 CPU1 [ 87.522624][ T9] ---- ---- [ 87.522902][ T9] lock((work_completion)(&wfc.work)); [ 87.523191][ T9] lock(cpu_add_remove_lock); [ 87.523569][ T9] lock((work_completion)(&wfc.work)); [ 87.523984][ T9] lock(watchdog_mutex); [ 87.524212][ T9] [ 87.524212][ T9] *** DEADLOCK *** [ 87.524212][ T9] [ 87.524628][ T9] 2 locks held by kworker/0:1/9: [ 87.524885][ T9] #0: ffff88810007cd58 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works (kernel/workqueue.c:3206) [ 87.525461][ T9] #1: ffffc9000009fd58 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works (kernel/workqueue.c:3207) [ 87.526065][ T9] [ 87.526065][ T9] stack backtrace: [ 87.526372][ T9] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.10.0-rc1-00236-gd362c5c67bb9 #1 [ 87.526839][ T9] Workqueue: events work_for_cpu_fn [ 87.527114][ T9] Call Trace: [ 87.527292][ T9] <TASK> [ 87.527451][ T9] dump_stack_lvl (lib/dump_stack.c:119) [ 87.527691][ T9] check_noncircular (kernel/locking/lockdep.c:?) [ 87.527955][ T9] __lock_acquire (kernel/locking/lockdep.c:3135) [ 87.528218][ T9] ? lock_release (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:352 kernel/locking/lockdep.c:5436 kernel/locking/lockdep.c:5774) [ 87.528466][ T9] lock_acquire (kernel/locking/lockdep.c:5754) [ 87.528703][ T9] ? lockup_detector_cleanup (kernel/watchdog.c:937) [ 87.528991][ T9] ? lockup_detector_cleanup (kernel/watchdog.c:937) [ 87.529293][ T9] __mutex_lock (kernel/locking/mutex.c:608) [ 87.529530][ T9] ? lockup_detector_cleanup (kernel/watchdog.c:937) [ 87.529817][ T9] ? mark_lock (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:4656) [ 87.530047][ T9] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:?) [ 87.530361][ T9] ? _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202) [ 87.530635][ T9] lockup_detector_cleanup (kernel/watchdog.c:937) [ 87.530911][ T9] _cpu_down (kernel/cpu.c:1450) [ 87.531139][ T9] ? process_scheduled_works (kernel/workqueue.c:3207) [ 87.531440][ T9] __cpu_down_maps_locked (kernel/cpu.c:1463) [ 87.531716][ T9] ? __pfx___cpu_down_maps_locked (kernel/cpu.c:1460) [ 87.532039][ T9] work_for_cpu_fn (kernel/workqueue.c:6670) [ 87.532285][ T9] process_scheduled_works (kernel/workqueue.c:?) [ 87.532594][ T9] worker_thread (include/linux/list.h:373 kernel/workqueue.c:946 kernel/workqueue.c:3394) [ 87.532839][ T9] ? lock_release (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:352 kernel/locking/lockdep.c:5436 kernel/locking/lockdep.c:5774) [ 87.533103][ T9] ? __kthread_parkme (kernel/kthread.c:?) [ 87.533365][ T9] ? __kthread_parkme (include/linux/instrumented.h:? include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/kthread.c:280) [ 87.533629][ T9] kthread (kernel/kthread.c:391) [ 87.533846][ T9] ? __pfx_worker_thread (kernel/workqueue.c:3339) [ 87.534117][ T9] ? __pfx_kthread (kernel/kthread.c:342) [ 87.534361][ T9] ret_from_fork (arch/x86/kernel/process.c:153) [ 87.534597][ T9] ? __pfx_kthread (kernel/kthread.c:342) [ 87.534841][ T9] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 87.535111][ T9] </TASK> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240611/202406111537.dd9d27e9-lkp@xxxxxxxxx -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki