On Fri, Jan 27, 2017 at 12:07:34PM -0500, Brian Foster wrote: > The problem looks like a race between dquot reclaim and quotacheck. The > high level sequence of events is as follows: > > - During quotacheck, xfs_qm_dqiterate() walks the physical dquot > buffers and queues them to the delwri queue. > - Next, kswapd kicks in and attempts to reclaim a dquot that is backed > by a buffer on the quotacheck delwri queue. xfs_qm_dquot_isolate() > acquires the flush lock and attempts to queue to the reclaim delwri > queue. This silently fails because the buffer is already queued. > > From this point forward, the dquot flush lock is not going to be > released until the buffer is submitted for I/O and completed via > quotacheck. > - Quotacheck continues on to the xfs_qm_flush_one() pass, hits the > dquot in question and waits on the flush lock to issue the flush of > the recalculated values. *deadlock* > > There are at least a few ways to deal with this. We could do something > granular to fix up the reclaim path to check whether the buffer is > already queued or something of that nature before we actually invoke the > flush. I think this is effectively pointless, however, because the first > part of quotacheck walks and queues all physical dquot buffers anyways. > > In other words, I think dquot reclaim during quotacheck should probably > be bypassed. .... > Note that I think this does mean that you could still have low memory > issues if you happen to have a lot of quotas defined.. Hmmm..... Really needs fixing. I think submitting the buffer list after xfs_qm_dqiterate() and waiting for completion will avoid this problem. However, I suspect reclaim can still race with flushing, so we need to detect "stuck" dquots, submit the delwri buffer queue and wait, then flush the dquot again. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html