Hey Paul, Lai, Josh, and the RCU list and Jan/FS list - Reaching out about a tricky hung task issue that I'm running into. I've got a virtualized Linux guest on top of a KVM based platform, running a 5.10.y based kernel. The issue we're running into is a hung task that *only* happens on shutdown/reboot of this particular VM once every 20-50 times. The signature of the hung task is always similar to the output below, where we appear to hang on the call to synchronize_srcu(&fsnotify_mark_srcu) in fsnotify_connector_destroy_workfn / fsnotify_mark_destroy_workfn, where two kernel threads are both calling synchronize_srcu, then scheduling out in wait_for_completion, and completely going out to lunch for over 4 minutes. This then triggers the hung task timeout and things blow up. We are running audit=1 for this system and are using an el8 based userspace. I've flipped through the fs/notify code base for both 5.10 as well as upstream mainline to see if something jumped off the page, and I haven't yet spotted any particular suspect code from the caller side. This hang appears to come up at the very end of the shutdown/reboot process, seemingly after the system starts to unwind through initrd. What I'm working on now is adding some instrumentation to the dracut shutdown initrd scripts to see if I can how far we get down that path before the system fails to make forward progress, which may give some hints. TBD on that. I've also enabled lockdep with CONFIG_PROVE_RCU and a plethora of DEBUG options [2], and didn't get anything interesting. To be clear, we haven't seen lockdep spit out any complaints as of yet. Reaching out to see if this sounds familar to anyone on the list, or if there are any particular areas of the RCU code base that might be suspect for this kind of issue. I'm happy to provide more information, as frankly, I'm quite stumped at the moment. Thanks all, Jon [1] panic trace Normal shutdown process, then hangs on the following: ... dracut Warning: Killing all remaining processes ... INFO: task kworker/u20:7:1200701 blocked for more than 241 seconds. Tainted: G O 5.10.205-2.el8.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u20:7 state:D stack: 0 pid:1200701 ppid: 2 flags:0x00004080 Workqueue: events_unbound fsnotify_connector_destroy_workfn Call Trace: __schedule+0x267/0x790 schedule+0x3c/0xb0 schedule_timeout+0x219/0x2b0 wait_for_completion+0x9e/0x100 __synchronize_srcu.part.24+0x83/0xb0 ? __bpf_trace_rcu_utilization+0x10/0x10 ? synchronize_srcu+0x5d/0xf0 fsnotify_connector_destroy_workfn+0x46/0x80 process_one_work+0x1fc/0x390 worker_thread+0x2d/0x3e0 ? process_one_work+0x390/0x390 kthread+0x114/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x1f/0x30 INFO: task kworker/u20:8:1287360 blocked for more than 241 seconds. Tainted: G O 5.10.205-2.el8.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u20:8 state:D stack: 0 pid:1287360 ppid: 2 flags:0x00004080 Workqueue: events_unbound fsnotify_mark_destroy_workfn Call Trace: __schedule+0x267/0x790 schedule+0x3c/0xb0 schedule_timeout+0x219/0x2b0 ? add_timer+0x14a/0x200 wait_for_completion+0x9e/0x100 __synchronize_srcu.part.24+0x83/0xb0 ? __bpf_trace_rcu_utilization+0x10/0x10 fsnotify_mark_destroy_workfn+0x77/0xe0 process_one_work+0x1fc/0x390 ? process_one_work+0x390/0x390 worker_thread+0x2d/0x3e0 ? process_one_work+0x390/0x390 kthread+0x114/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x1f/0x30 Kernel panic - not syncing: hung_task: blocked tasks CPU: 1 PID: 64 Comm: khungtaskd Kdump: loaded Tainted: G O 5.10.205-2.el8.x86_64 #1 Hardware name: Red Hat KVM, BIOS 20230302.1.2662.el8 04/01/2014 Call Trace: dump_stack+0x6d/0x8c panic+0x114/0x2ea watchdog.cold.8+0xb5/0xb5 ? hungtask_pm_notify+0x50/0x50 kthread+0x114/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x1f/0x30 [2] additional debugging config knobs turned up. CONFIG_PROVE_LOCKING=y CONFIG_LOCK_STAT=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y CONFIG_DEBUG_RWSEMS=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_LOCKDEP=y CONFIG_LOCKDEP_BITS=15 CONFIG_LOCKDEP_CHAINS_BITS=16 CONFIG_LOCKDEP_STACK_TRACE_BITS=19 CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14 CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12 CONFIG_DEBUG_SHIRQ=y CONFIG_WQ_WATCHDOG=y CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_DEBUG_LIST=y CONFIG_DEBUG_PLIST=y CONFIG_DEBUG_SG=y CONFIG_DEBUG_NOTIFIERS=y CONFIG_BUG_ON_DATA_CORRUPTION=y