Re: SRCU hung task on 5.10.y on synchronize_srcu(&fsnotify_mark_srcu)

Jan Kara <jack@xxxxxxx> · Thu, 5 Sep 2024 15:53:52 +0200

On Wed 04-09-24 14:40:07, Jon Kohler wrote:
> 
> 
> > On Sep 4, 2024, at 5:19 AM, Jan Kara <jack@xxxxxxx> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  CAUTION: External Email
> > 
> > |-------------------------------------------------------------------!
> > 
> > On Tue 27-08-24 20:01:27, Jon Kohler wrote:
> >> Hey Paul, Lai, Josh, and the RCU list and Jan/FS list -
> >> Reaching out about a tricky hung task issue that I'm running into. I've
> >> got a virtualized Linux guest on top of a KVM based platform, running
> >> a 5.10.y based kernel. The issue we're running into is a hung task that
> >> *only* happens on shutdown/reboot of this particular VM once every 
> >> 20-50 times.
> >> 
> >> The signature of the hung task is always similar to the output below,
> >> where we appear to hang on the call to 
> >>    synchronize_srcu(&fsnotify_mark_srcu)
> >> in fsnotify_connector_destroy_workfn / fsnotify_mark_destroy_workfn,
> >> where two kernel threads are both calling synchronize_srcu, then
> >> scheduling out in wait_for_completion, and completely going out to
> >> lunch for over 4 minutes. This then triggers the hung task timeout and
> >> things blow up.
> > 
> > Well, the most obvious reason for this would be that some process is
> > hanging somewhere with fsnotify_mark_srcu held. When this happens, can you
> > trigger sysrq-w in the VM and send here its output?
> 
> Jan - Thanks for the ping, that is *exactly* what is happening here.
> Some developments since my last note, the patch Neeraj pointed out
> wasn't the issue, but rather a confluence of realtime thread configurations
> that ended up completely starving whatever CPU was processing per-CPU
> callbacks. So, one thread would go out to lunch completely, and it would
> just never yield. This particular system was configured with RT_RUNTIME_SHARE
> unfortunately, so that realtime thread going out to lunch ate the entire system.

Glad to hear this is explained (at least partially) :)

> What was odd is that this never, ever happened during runtime on some
> of these systems that have been up for years and getting beat up heavily,
> but rather only on shutdown. We’ve got more to chase down internally on
> that.
> 
> One thing I wanted to bring up here though while I have you, I have
> noticed through various hits on google, mailing lists, etc over the years that
> this specific type of lockup with fsnotify_mark_srcu seems to happen now
> and then for various oddball reasons, with various root causes. 
> 
> It made me think that I wonder if there is a better structure that could be
> used here that might be a bit more durable. To be clear, I’m not saying that
> SRCU *is not* durable or anything of the sort (I promise!) but rather
> wondering if there was anything we could think about tweaking on the
> fsnotify side of the house to be more efficient.

Well, fsnotify_mark_srcu used to be a big problem in the past where
fanotify code was waiting for userspace response to fanotify event with it
held. And when userspace didn't reply, the kernel got stuck. After we have
removed that sore spot couple years ago, I'm not aware of any more problems
with it. In fact your report is probably the first one in a few years. So
I'm hoping your google hits are mostly from the past / with old kernels :)

								Honza

> >> We are running audit=1 for this system and are using an el8 based
> >> userspace.
> >> 
> >> I've flipped through the fs/notify code base for both 5.10 as well as
> >> upstream mainline to see if something jumped off the page, and I
> >> haven't yet spotted any particular suspect code from the caller side.
> >> 
> >> This hang appears to come up at the very end of the shutdown/reboot
> >> process, seemingly after the system starts to unwind through initrd.
> >> 
> >> What I'm working on now is adding some instrumentation to the dracut
> >> shutdown initrd scripts to see if I can how far we get down that path
> >> before the system fails to make forward progress, which may give some
> >> hints. TBD on that. I've also enabled lockdep with CONFIG_PROVE_RCU and
> >> a plethora of DEBUG options [2], and didn't get anything interesting.
> >> To be clear, we haven't seen lockdep spit out any complaints as of yet.
> > 
> > The fact that lockdep doesn't report anything is interesting but then
> > lockdep doesn't track everything. In particular I think SRCU itself isn't
> > tracked by lockdep.
> > 
> > Honza
> > -- 
> > Jan Kara <jack@xxxxxxxx>
> > SUSE Labs, CR
> 
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR