Re: [PATCH v2] xfs: enable WQ_MEM_RECLAIM on m_sync_workqueue

"NeilBrown" <neilb@xxxxxxx> · Wed, 10 Jul 2024 09:12:58 +1000

On Thu, 04 Jul 2024, Christoph Hellwig wrote:
> On Wed, Jul 03, 2024 at 09:29:00PM +1000, NeilBrown wrote:
> > I know nothing of this stance.  Do you have a reference?
> 
> No particular one.
> 
> > I have put a modest amount of work into ensure NFS to a server on the
> > same machine works and last I checked it did - though I'm more
> > confident of NFSv3 than NFSv4 because of the state manager thread.
> 
> How do you propagate the NOFS flag (and NOIO for a loop device) to
> the server an the workqueues run by the server and the file system
> call by it?  How do you ensure WQ_MEM_RECLAIM gets propagate to
> all workqueues that could be called by the file system on the
> server (the problem kicking off this discussion)?
> 

Do we need to propagate these?

NOFS is for deadlock avoidance.  A filesystem "backend" (Dave's term - I
think for the parts of the fs that handle write-back) might allocate
memory, that might block waiting for memory reclaim, memory reclaim
might re-enter the filesystem backend and might block on a lock (or
similar) held while allocating memory.  NOFS breaks that deadlock.

The important thing here isn't the NOFS flag, it is breaking any
possible deadlock.

Layered filesystems introduce a new complexity.  The backend for one
filesystem can call into the front end of another filesystem.  That
front-end is not required to use NOFS and even if we impose
PF_MEMALLOC_NOFS, the front-end might wait for some work-queue action
which doesn't inherit the NOFS flag.

But this doesn't necessarily matter.  Calling into the filesystem is not
the problem - blocking waiting for a reply is the problem.  It is
blocking that creates deadlocks.  So if the backend of one filesystem
queues to a separate thread the work for the front end of the other
filesystem and doesn't wait for the work to complete, then a deadlock
cannot be introduced.

/dev/loop uses the loop%d workqueue for this.  loop-back NFS hands the
front-end work over to nfsd.  The proposed localio implementation uses a
nfslocaliod workqueue for exactly the same task.  These remove the
possibility of deadlock and mean that there is no need to pass NOFS
through to the front-end of the backing filesystem.

Note that there is a separate question concerning pageout to a swap
file.  pageout needs more than just deadlock avoidance.  It needs
guaranteed progress in low memory conditions.   It needs PF_MEMALLOC (or
mempools) and that cannot be finessed using work queues.  I don't think
that Linux is able to support pageout through layered filesystems.

So while I support loop-back NFS and swap-over-NFS, I don't support them
in combination.  We don't support swap on /dev/loop when it is backed by
a file - for that we have swap-to-file.

Thank you for challenging me on this - it helped me clarify my thoughts
and understanding for myself.

NeilBrown