Re: [PATCH] xfs: don't change to infinate lock to avoid dead lock

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 25 Apr 2020 07:37:29 +1000

On Fri, Apr 24, 2020 at 09:58:09AM -0700, Wengang Wang wrote:
> On 4/23/20 6:39 PM, Dave Chinner wrote:
> > On Thu, Apr 23, 2020 at 04:19:52PM -0700, Wengang Wang wrote:
> > > On 4/23/20 4:14 PM, Wengang Wang wrote:
> > > > The real case I hit is that the process A is waiting for inode unpin on
> > > > XFS A which is a loop device backed mount.
> > > And actually, there is a dm-thin on top of the loop device..
> > Makes no difference, really, because it's still the loop device
> > that is doing the IO to the underlying filesystem...
> I mentioned IO path here, not the IO its self.  In this case, the IO patch
> includes dm-thin.
> 
> We have to consider it as long as we are not sure if there is GPF_KERNEL (or
> any flags without NOFS, NOIO) allocation happens in dm-thin.
> 
> If dm-thin has GPF_KERNEL allocation and goes into memory direct reclaiming,
> the deadlock forms.

If that happens, then that is a bug in dm-thin, not a bug in XFS.
There are rules to how memory allocation must be done to avoid
deadlocks, and one of those is that block device level IO path
allocations *must* use GFP_NOIO. This prevents reclaim from
recursing into subsystems that might require IO to reclaim memory
and hence self deadlock because the IO layer requires allocation to
succeed to make forwards progress.

That's why we have mempools and GFP_NOIO at the block and device
layers....

> > > > And the backing file is from a different (X)FS B mount. So the IO is
> > > > going through loop device, (direct) writes to (X)FS B.
> > > > 
> > > > The (direct) writes to (X)FS B do memory allocations and then memory
> > > > direct reclaims...
> > THe loop device issues IO to the lower filesystem in
> > memalloc_noio_save() context, which means all memory allocations in
> > it's IO path are done with GFP_NOIO context. Hence those allocations
> > will not recurse into reclaim on -any filesystem- and hence will not
> > deadlock on filesystem reclaim. So what I said originally is correct
> > even when we take filesystems stacked via loop devices into account.
> You are right here. Seems loop device is doing NOFS|NOIO allocations.
> 
> The deadlock happened with a bit lower kernel version which is without loop
> device patch that does NOFS|NOIO allocation.

Right, the loop device used to have an allocation context bug, but
that has been fixed. Either way, this is not an XFS or even a
filesystem layer issue.

> Well, here you are only talking about loop device, it's not enough to say
> it's also safe in case the memory reclaiming happens at higher layer above
> loop device in the IO path.

Yes it is.

Block devices and device drivers are *required* to use GFP_NOIO
context for memory allocations in the IO path. IOWs, any block
device that is doing GFP_KERNEL context allocation violates the
memory allocation rules we have for the IO path.  This architectural
constraint exists exclusively to avoid this entire class of IO-based
memory reclaim recursion deadlocks.

> > Hence I'll ask again: do you have stack traces of the deadlock or a
> > lockdep report? If not, can you please describe the storage setup
> > from top to bottom and lay out exactly where in what layers trigger
> > this deadlock?
> 
> Sharing the callback traces:

<snip>

Yeah, so the loop device is doing GFP_KERNEL allocation in a
GFP_NOIO context. You need to fix the loop device in whatever kernel
you are testing, which you have conveniently never mentioned. I'm
betting this is a vendor kernel that is missing fixes from the
upstream kernel. In which case you need to talk to your OS vendor,
not upstream...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx