Re: RCU stall when using XFS

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 1 Sep 2020 07:36:12 +1000

On Mon, Aug 31, 2020 at 02:19:15PM -0700, Darrick J. Wong wrote:
> On Mon, Aug 31, 2020 at 03:22:15PM -0400, Mikulas Patocka wrote:
> > Hi
> > 
> > I report this RCU stall when working with one 512GiB file the XFS 
> > filesystem on persistent memory. Except for the warning, there was no 
> > observed misbehavior.
> > 
> > Perhaps, it is missing cond_resched() somewhere.
> 
> Yikes, you can send a 2T request to a pmem device??
> 
> /sys/block/pmem0/queue/max_hw_sectors_kb : 2147483647
> 
> My puny laptop can only push 29GB/s, which I guess means we could stall
> on an IO request for 70 seconds...

This looks like another symptom of the same "bio sizes in writeback
are now unbound if contiguous physical pages are added to them"
problem I raised here when considering a similar hard lockup report
with a 2GB bio:

https://lore.kernel.org/linux-xfs/20200821215358.GG7941@xxxxxxxxxxxxxxxxxxx/

Quote:

| .e. I'm not looking at this as a "bio overflow bug" - I'm
| commenting on what this overflow implies from an architectural point
| of view. i.e. that uncapped bio sizes and bio chain lengths in
| writeback are actually a bad thing and something we've always
| tried to avoid doing....

This looks like another instance of the same problem...

It really does look like iomap needs to cap the length of ioend and
bio chains...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx