On Tue 27-08-24 20:01:27, Jon Kohler wrote: > Hey Paul, Lai, Josh, and the RCU list and Jan/FS list - > Reaching out about a tricky hung task issue that I'm running into. I've > got a virtualized Linux guest on top of a KVM based platform, running > a 5.10.y based kernel. The issue we're running into is a hung task that > *only* happens on shutdown/reboot of this particular VM once every > 20-50 times. > > The signature of the hung task is always similar to the output below, > where we appear to hang on the call to > synchronize_srcu(&fsnotify_mark_srcu) > in fsnotify_connector_destroy_workfn / fsnotify_mark_destroy_workfn, > where two kernel threads are both calling synchronize_srcu, then > scheduling out in wait_for_completion, and completely going out to > lunch for over 4 minutes. This then triggers the hung task timeout and > things blow up. Well, the most obvious reason for this would be that some process is hanging somewhere with fsnotify_mark_srcu held. When this happens, can you trigger sysrq-w in the VM and send here its output? > We are running audit=1 for this system and are using an el8 based > userspace. > > I've flipped through the fs/notify code base for both 5.10 as well as > upstream mainline to see if something jumped off the page, and I > haven't yet spotted any particular suspect code from the caller side. > > This hang appears to come up at the very end of the shutdown/reboot > process, seemingly after the system starts to unwind through initrd. > > What I'm working on now is adding some instrumentation to the dracut > shutdown initrd scripts to see if I can how far we get down that path > before the system fails to make forward progress, which may give some > hints. TBD on that. I've also enabled lockdep with CONFIG_PROVE_RCU and > a plethora of DEBUG options [2], and didn't get anything interesting. > To be clear, we haven't seen lockdep spit out any complaints as of yet. The fact that lockdep doesn't report anything is interesting but then lockdep doesn't track everything. In particular I think SRCU itself isn't tracked by lockdep. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR