Re: SRCU hung task on 5.10.y on synchronize_srcu(&fsnotify_mark_srcu)

Jon Kohler <jon@xxxxxxxxxxx> · Wed, 4 Sep 2024 14:40:07 +0000

> On Sep 4, 2024, at 5:19 AM, Jan Kara <jack@xxxxxxx> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Tue 27-08-24 20:01:27, Jon Kohler wrote:
>> Hey Paul, Lai, Josh, and the RCU list and Jan/FS list -
>> Reaching out about a tricky hung task issue that I'm running into. I've
>> got a virtualized Linux guest on top of a KVM based platform, running
>> a 5.10.y based kernel. The issue we're running into is a hung task that
>> *only* happens on shutdown/reboot of this particular VM once every 
>> 20-50 times.
>> 
>> The signature of the hung task is always similar to the output below,
>> where we appear to hang on the call to 
>>    synchronize_srcu(&fsnotify_mark_srcu)
>> in fsnotify_connector_destroy_workfn / fsnotify_mark_destroy_workfn,
>> where two kernel threads are both calling synchronize_srcu, then
>> scheduling out in wait_for_completion, and completely going out to
>> lunch for over 4 minutes. This then triggers the hung task timeout and
>> things blow up.
> 
> Well, the most obvious reason for this would be that some process is
> hanging somewhere with fsnotify_mark_srcu held. When this happens, can you
> trigger sysrq-w in the VM and send here its output?

Jan - Thanks for the ping, that is *exactly* what is happening here.
Some developments since my last note, the patch Neeraj pointed out
wasn't the issue, but rather a confluence of realtime thread configurations
that ended up completely starving whatever CPU was processing per-CPU
callbacks. So, one thread would go out to lunch completely, and it would
just never yield. This particular system was configured with RT_RUNTIME_SHARE
unfortunately, so that realtime thread going out to lunch ate the entire system.

What was odd is that this never, ever happened during runtime on some
of these systems that have been up for years and getting beat up heavily,
but rather only on shutdown. We’ve got more to chase down internally on
that.

One thing I wanted to bring up here though while I have you, I have
noticed through various hits on google, mailing lists, etc over the years that
this specific type of lockup with fsnotify_mark_srcu seems to happen now
and then for various oddball reasons, with various root causes. 

It made me think that I wonder if there is a better structure that could be
used here that might be a bit more durable. To be clear, I’m not saying that
SRCU *is not* durable or anything of the sort (I promise!) but rather
wondering if there was anything we could think about tweaking on the
fsnotify side of the house to be more efficient.

Thoughts?

> 
>> We are running audit=1 for this system and are using an el8 based
>> userspace.
>> 
>> I've flipped through the fs/notify code base for both 5.10 as well as
>> upstream mainline to see if something jumped off the page, and I
>> haven't yet spotted any particular suspect code from the caller side.
>> 
>> This hang appears to come up at the very end of the shutdown/reboot
>> process, seemingly after the system starts to unwind through initrd.
>> 
>> What I'm working on now is adding some instrumentation to the dracut
>> shutdown initrd scripts to see if I can how far we get down that path
>> before the system fails to make forward progress, which may give some
>> hints. TBD on that. I've also enabled lockdep with CONFIG_PROVE_RCU and
>> a plethora of DEBUG options [2], and didn't get anything interesting.
>> To be clear, we haven't seen lockdep spit out any complaints as of yet.
> 
> The fact that lockdep doesn't report anything is interesting but then
> lockdep doesn't track everything. In particular I think SRCU itself isn't
> tracked by lockdep.
> 
> Honza
> -- 
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR