On Thu, 04 Jul 2024, Christoph Hellwig wrote: > On Wed, Jul 03, 2024 at 09:29:00PM +1000, NeilBrown wrote: > > I know nothing of this stance. Do you have a reference? > > No particular one. > > > I have put a modest amount of work into ensure NFS to a server on the > > same machine works and last I checked it did - though I'm more > > confident of NFSv3 than NFSv4 because of the state manager thread. > > How do you propagate the NOFS flag (and NOIO for a loop device) to > the server an the workqueues run by the server and the file system > call by it? How do you ensure WQ_MEM_RECLAIM gets propagate to > all workqueues that could be called by the file system on the > server (the problem kicking off this discussion)? > Do we need to propagate these? NOFS is for deadlock avoidance. A filesystem "backend" (Dave's term - I think for the parts of the fs that handle write-back) might allocate memory, that might block waiting for memory reclaim, memory reclaim might re-enter the filesystem backend and might block on a lock (or similar) held while allocating memory. NOFS breaks that deadlock. The important thing here isn't the NOFS flag, it is breaking any possible deadlock. Layered filesystems introduce a new complexity. The backend for one filesystem can call into the front end of another filesystem. That front-end is not required to use NOFS and even if we impose PF_MEMALLOC_NOFS, the front-end might wait for some work-queue action which doesn't inherit the NOFS flag. But this doesn't necessarily matter. Calling into the filesystem is not the problem - blocking waiting for a reply is the problem. It is blocking that creates deadlocks. So if the backend of one filesystem queues to a separate thread the work for the front end of the other filesystem and doesn't wait for the work to complete, then a deadlock cannot be introduced. /dev/loop uses the loop%d workqueue for this. loop-back NFS hands the front-end work over to nfsd. The proposed localio implementation uses a nfslocaliod workqueue for exactly the same task. These remove the possibility of deadlock and mean that there is no need to pass NOFS through to the front-end of the backing filesystem. Note that there is a separate question concerning pageout to a swap file. pageout needs more than just deadlock avoidance. It needs guaranteed progress in low memory conditions. It needs PF_MEMALLOC (or mempools) and that cannot be finessed using work queues. I don't think that Linux is able to support pageout through layered filesystems. So while I support loop-back NFS and swap-over-NFS, I don't support them in combination. We don't support swap on /dev/loop when it is backed by a file - for that we have swap-to-file. Thank you for challenging me on this - it helped me clarify my thoughts and understanding for myself. NeilBrown