On Tue, Sep 01, 2020 at 07:36:12AM +1000, Dave Chinner wrote: > On Mon, Aug 31, 2020 at 02:19:15PM -0700, Darrick J. Wong wrote: > > On Mon, Aug 31, 2020 at 03:22:15PM -0400, Mikulas Patocka wrote: > > > Hi > > > > > > I report this RCU stall when working with one 512GiB file the XFS > > > filesystem on persistent memory. Except for the warning, there was no > > > observed misbehavior. > > > > > > Perhaps, it is missing cond_resched() somewhere. > > > > Yikes, you can send a 2T request to a pmem device?? > > > > /sys/block/pmem0/queue/max_hw_sectors_kb : 2147483647 > > > > My puny laptop can only push 29GB/s, which I guess means we could stall > > on an IO request for 70 seconds... > > This looks like another symptom of the same "bio sizes in writeback > are now unbound if contiguous physical pages are added to them" > problem I raised here when considering a similar hard lockup report > with a 2GB bio: > > https://lore.kernel.org/linux-xfs/20200821215358.GG7941@xxxxxxxxxxxxxxxxxxx/ > > Quote: > > | .e. I'm not looking at this as a "bio overflow bug" - I'm > | commenting on what this overflow implies from an architectural point > | of view. i.e. that uncapped bio sizes and bio chain lengths in > | writeback are actually a bad thing and something we've always > | tried to avoid doing.... > > This looks like another instance of the same problem... > > It really does look like iomap needs to cap the length of ioend and > bio chains... And Brian pointed out he'd alredy written such a patch: https://lore.kernel.org/linux-xfs/20200825144917.GA321765@bfoster/#t I missed it because I had some hardware issues with the machine that stores my mail around then. This does what I was suggesting, so it would be worth testing to see if it fixes this problem as well. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx