Re: [PATCH] xfs: don't change to infinate lock to avoid dead lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 23, 2020 at 04:19:52PM -0700, Wengang Wang wrote:
> 
> On 4/23/20 4:14 PM, Wengang Wang wrote:
> > 
> > On 4/23/20 4:05 PM, Dave Chinner wrote:
> > > On Thu, Apr 23, 2020 at 10:23:25AM -0700, Wengang Wang wrote:
> > > > xfs_reclaim_inodes_ag() do infinate locking on
> > > > pag_ici_reclaim_lock at the
> > > > 2nd round of walking of all AGs when SYNC_TRYLOCK is set
> > > > (conditionally).
> > > > That causes dead lock in a special situation:
> > > > 
> > > > 1) In a heavy memory load environment, process A is doing direct memory
> > > > reclaiming waiting for xfs_inode.i_pincount to be cleared while holding
> > > > mutex lock pag_ici_reclaim_lock.
> > > > 
> > > > 2) i_pincount is increased by adding the xfs_inode to journal
> > > > transection,
> > > > and it's expected to be decreased when the transection related
> > > > IO is done.
> > > > Step 1) happens after i_pincount is increased and before
> > > > truansection IO is
> > > > issued.
> > > > 
> > > > 3) Now the transection IO is issued by process B. In the IO path
> > > > (IO could
> > > > be more complex than you think), memory allocation and memory direct
> > > > reclaiming happened too.
> > > Sure, but IO path allocations are done under GFP_NOIO context, which
> > > means IO path allocations can't recurse back into filesystem reclaim
> > > via direct reclaim. Hence there should be no way for an IO path
> > > allocation to block on XFS inode reclaim and hence there's no
> > > possible deadlock here...
> > > 
> > > IOWs, I don't think this is the deadlock you are looking for. Do you
> > > have a lockdep report or some other set of stack traces that lead
> > > you to this point?
> > 
> > As I mentioned, the IO path can be more complex than you think.

I don't think the IO path is complex. I *know* the IO path is
complex. I also know how we manage that complexity to prevent things
like stacked devices and filesystems from deadlocking, but I also
know that there are bugs and other architectural deficiencies that
may be playing a part here.

The problem is, your description of the problem tells me nothing
about where the problem might lie. You are telling me what you think
the problem is, rather than explaining the way the problem comes
about, what storage stack configuration and IO behaviour triggers
it, etc. Hence I cannot determine what the problem you are seeing
actually is, and hence I cannot evaluate whether your patch is
correct.

> > The real case I hit is that the process A is waiting for inode unpin on
> > XFS A which is a loop device backed mount.
> 
> And actually, there is a dm-thin on top of the loop device..

Makes no difference, really, because it's still the loop device
that is doing the IO to the underlying filesystem...

> > And the backing file is from a different (X)FS B mount. So the IO is
> > going through loop device, (direct) writes to (X)FS B.
> > 
> > The (direct) writes to (X)FS B do memory allocations and then memory
> > direct reclaims...

THe loop device issues IO to the lower filesystem in
memalloc_noio_save() context, which means all memory allocations in
it's IO path are done with GFP_NOIO context. Hence those allocations
will not recurse into reclaim on -any filesystem- and hence will not
deadlock on filesystem reclaim. So what I said originally is correct
even when we take filesystems stacked via loop devices into account.

Hence I'll ask again: do you have stack traces of the deadlock or a
lockdep report? If not, can you please describe the storage setup
from top to bottom and lay out exactly where in what layers trigger
this deadlock?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux