Re: SRCU hung task on 5.10.y on synchronize_srcu(&fsnotify_mark_srcu)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 27-08-24 20:01:27, Jon Kohler wrote:
> Hey Paul, Lai, Josh, and the RCU list and Jan/FS list -
> Reaching out about a tricky hung task issue that I'm running into. I've
> got a virtualized Linux guest on top of a KVM based platform, running
> a 5.10.y based kernel. The issue we're running into is a hung task that
> *only* happens on shutdown/reboot of this particular VM once every 
> 20-50 times.
> 
> The signature of the hung task is always similar to the output below,
> where we appear to hang on the call to 
>     synchronize_srcu(&fsnotify_mark_srcu)
> in fsnotify_connector_destroy_workfn / fsnotify_mark_destroy_workfn,
> where two kernel threads are both calling synchronize_srcu, then
> scheduling out in wait_for_completion, and completely going out to
> lunch for over 4 minutes. This then triggers the hung task timeout and
> things blow up.

Well, the most obvious reason for this would be that some process is
hanging somewhere with fsnotify_mark_srcu held. When this happens, can you
trigger sysrq-w in the VM and send here its output?

> We are running audit=1 for this system and are using an el8 based
> userspace.
> 
> I've flipped through the fs/notify code base for both 5.10 as well as
> upstream mainline to see if something jumped off the page, and I
> haven't yet spotted any particular suspect code from the caller side.
> 
> This hang appears to come up at the very end of the shutdown/reboot
> process, seemingly after the system starts to unwind through initrd.
> 
> What I'm working on now is adding some instrumentation to the dracut
> shutdown initrd scripts to see if I can how far we get down that path
> before the system fails to make forward progress, which may give some
> hints. TBD on that. I've also enabled lockdep with CONFIG_PROVE_RCU and
> a plethora of DEBUG options [2], and didn't get anything interesting.
> To be clear, we haven't seen lockdep spit out any complaints as of yet.

The fact that lockdep doesn't report anything is interesting but then
lockdep doesn't track everything. In particular I think SRCU itself isn't
tracked by lockdep.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux