Re: [PATCH v2] xfs: enable WQ_MEM_RECLAIM on m_sync_workqueue

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 4 Jul 2024 09:02:09 +1000

On Wed, Jul 03, 2024 at 07:15:28AM -0700, Christoph Hellwig wrote:
> On Wed, Jul 03, 2024 at 09:29:00PM +1000, NeilBrown wrote:
> > I know nothing of this stance.  Do you have a reference?
> 
> No particular one.
> 
> > I have put a modest amount of work into ensure NFS to a server on the
> > same machine works and last I checked it did - though I'm more
> > confident of NFSv3 than NFSv4 because of the state manager thread.
> 
> How do you propagate the NOFS flag (and NOIO for a loop device) to
> the server an the workqueues run by the server and the file system
> call by it?  How do you ensure WQ_MEM_RECLAIM gets propagate to
> all workqueues that could be called by the file system on the
> server (the problem kicking off this discussion)?

Don't forget PF_LOCAL_THROTTLE, too.  I note that nfsd_vfs_write()
knows when it is doing local loopback write IO and in that case sets
PF_LOCAL_THROTTLE:

	if (test_bit(RQ_LOCAL, &rqstp->rq_flags) &&
            !(exp_op_flags & EXPORT_OP_REMOTE_FS)) {
                /*
                 * We want throttling in balance_dirty_pages()
                 * and shrink_inactive_list() to only consider
                 * the backingdev we are writing to, so that nfs to
                 * localhost doesn't cause nfsd to lock up due to all
                 * the client's dirty pages or its congested queue.
                 */
                current->flags |= PF_LOCAL_THROTTLE;
                restore_flags = true;
        }

This also has impact on memory reclaim congestion throttling (i.e.
it turns it off), which is also needed for loopback IO to prevent it
being throttled by reclaim because it getting congested trying to
reclaim all the dirty pages on the upper filesystem that the IO
thread is trying to clean...

However, I don't see it calling memalloc_nofs_save() there to
prevent memory reclaim recursion back into the upper NFS client
filesystem.

I suspect that because filesystems like XFS hard code GFP_NOFS
context for page cache allocation to prevent NFSD loopback IO from
deadlocking hides this issue. We've had to do that because,
historically speaking, there wasn't been a way for high level IO
submitters to indicate they need GFP_NOFS allocation context.

Howver, we have had the memalloc_nofs_save/restore() scoped API for
several years now, so it seems to me that the nfsd should really be
using this rather than requiring the filesystem to always use
GFP_NOFS allocations to avoid loopback IO memory allocation
deadlocks...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx