Neil, Firstly, thank you for your work on this. I'm probably the main beneficiary of this (NFSD) effort atm so I feel extra special and lucky! I have done some quick artificial tests similar to before where I am using a NFS server and client separated by an (extreme) 200ms of latency (great for testing parallelism). I am only using NFSv3 due to the NFSD_CACHE_SIZE_SLOTS_PER_SESSION parallelism limitations for NFSv4. Firstly, a client direct to server (VFS) with 10 simultaneous create processes hitting the same directory: client1 # for x in {1..1000}; do echo /srv/server1/data/touch.$x done | xargs -n1 -P 10 -iX -t touch X 2>&1 | pv -l -a >|/dev/null Without the patch ( on the client), this reaches a steady state of 2.4 creates/s and increasing the number of parallel create processes does not change this aggregate performance. With the patch, the creation rate increases to 15 creates/s and with 100 processes, it further scales up to 121 creates/s. Now for the re-export case (NFSD) where an intermediary server re-exports the originating server (200ms away) to clients on it's local LAN, there is no noticeable improvement for a single (not patched) client. But we do see an aggregate improvement when we use multiple clients at once. # pdsh -Rssh -w 'client[1-10]' 'for x in {1..1000}; do echo /srv/reexport1/data/$(hostname -s).$x; done | xargs -n1 -P 10 -iX -t touch X 2>&1' | pv -l -a >|/dev/null Without the patch applied to the reexport server, the aggregate is around 2.2 create/s which is similar to doing it directly to the originating server from a single client (above). With the patch, the aggregate increases to 15 creates/s for 10 clients which again matches the results of a single patched client. Not quite a x10 increase but a healthy improvement nonetheless. However, it is at this point that I started to experience some stability issues with the re-export server that are not present with the vanilla unpatched v5.19-rc2 kernel. In particular the knfsd threads start to lock up with stack traces like this: [ 1234.460696] INFO: task nfsd:5514 blocked for more than 123 seconds. [ 1234.461481] Tainted: G W E 5.19.0-1.dneg.x86_64 #1 [ 1234.462289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1234.463227] task:nfsd state:D stack: 0 pid: 5514 ppid: 2 flags:0x00004000 [ 1234.464212] Call Trace: [ 1234.464677] <TASK> [ 1234.465104] __schedule+0x2a9/0x8a0 [ 1234.465663] schedule+0x55/0xc0 [ 1234.466183] ? nfs_lookup_revalidate_dentry+0x3a0/0x3a0 [nfs] [ 1234.466995] __nfs_lookup_revalidate+0xdf/0x120 [nfs] [ 1234.467732] ? put_prev_task_stop+0x170/0x170 [ 1234.468374] nfs_lookup_revalidate+0x15/0x20 [nfs] [ 1234.469073] lookup_dcache+0x5a/0x80 [ 1234.469639] lookup_one_unlocked+0x59/0xa0 [ 1234.470244] lookup_one_len_unlocked+0x1d/0x20 [ 1234.470951] nfsd_lookup_dentry+0x190/0x470 [nfsd] [ 1234.471663] nfsd_lookup+0x88/0x1b0 [nfsd] [ 1234.472294] nfsd3_proc_lookup+0xb4/0x100 [nfsd] [ 1234.473012] nfsd_dispatch+0x161/0x290 [nfsd] [ 1234.473689] svc_process_common+0x48a/0x620 [sunrpc] [ 1234.474402] ? nfsd_svc+0x330/0x330 [nfsd] [ 1234.475038] ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd] [ 1234.475772] svc_process+0xbc/0xf0 [sunrpc] [ 1234.476408] nfsd+0xda/0x190 [nfsd] [ 1234.477011] kthread+0xf0/0x120 [ 1234.477522] ? kthread_complete_and_exit+0x20/0x20 [ 1234.478199] ret_from_fork+0x22/0x30 [ 1234.478755] </TASK> For whatever reason, they seem to affect our Netapp mounts and re-exports rather than our originating Linux NFS servers (against which all tests were done). This may be related to the fact that those Netapps serve our home directories so there could be some unique locking patterns going on there. This issue made things a bit too unstable to test at larger scales or with our production workloads. So all in all, the performance improvements in the knfsd re-export case is looking great and we have real world use cases that this helps with (batch processing workloads with latencies >10ms). If we can figure out the hanging knfsd threads, then I can test it more heavily. Many thanks, Daire On Tue, 14 Jun 2022 at 00:19, NeilBrown <neilb@xxxxxxx> wrote: > > VFS currently holds an exclusive lock on a directory during create, > unlink, rename. This imposes serialisation on all filesystems though > some may not benefit from it, and some may be able to provide finer > grained locking internally, thus reducing contention. > > This series allows the filesystem to request that the inode lock be > shared rather than exclusive. In that case an exclusive lock will be > held on the dentry instead, much as is done for parallel lookup. > > The NFS filesystem can easily support concurrent updates (server does > any needed serialiation) so it is converted. > > This series also converts nfsd to use the new interfaces so concurrent > incoming NFS requests in the one directory can be handled concurrently. > > As a net result, if an NFS mounted filesystem is reexported over NFS, > then multiple clients can create files in a single directory and all > synchronisation will be handled on the final server. This helps hid > latency on link from client to server. > > I include a few nfsd patches that aren't strictly needed for this work, > but seem to be a logical consequence of the changes that I did have to > make. > > I have only tested this lightly. In particular the rename support is > quite new and I haven't tried to break it yet. > > I post this for general review, and hopefully extra testing... Daire > Byrne has expressed interest in the NFS re-export parallelism. > > NeilBrown > > > --- > > NeilBrown (12): > VFS: support parallel updates in the one directory. > VFS: move EEXIST and ENOENT tests into lookup_hash_update() > VFS: move want_write checks into lookup_hash_update() > VFS: move dput() and mnt_drop_write() into done_path_update() > VFS: export done_path_update() > VFS: support concurrent renames. > NFS: support parallel updates in the one directory. > nfsd: allow parallel creates from nfsd > nfsd: support concurrent renames. > nfsd: reduce locking in nfsd_lookup() > nfsd: use (un)lock_inode instead of fh_(un)lock > nfsd: discard fh_locked flag and fh_lock/fh_unlock > > > fs/dcache.c | 59 ++++- > fs/namei.c | 578 ++++++++++++++++++++++++++++++++--------- > fs/nfs/dir.c | 29 ++- > fs/nfs/inode.c | 2 + > fs/nfs/unlink.c | 5 +- > fs/nfsd/nfs2acl.c | 6 +- > fs/nfsd/nfs3acl.c | 4 +- > fs/nfsd/nfs3proc.c | 37 +-- > fs/nfsd/nfs4acl.c | 7 +- > fs/nfsd/nfs4proc.c | 61 ++--- > fs/nfsd/nfs4state.c | 8 +- > fs/nfsd/nfsfh.c | 10 +- > fs/nfsd/nfsfh.h | 58 +---- > fs/nfsd/nfsproc.c | 31 +-- > fs/nfsd/vfs.c | 243 ++++++++--------- > fs/nfsd/vfs.h | 8 +- > include/linux/dcache.h | 27 ++ > include/linux/fs.h | 1 + > include/linux/namei.h | 30 ++- > 19 files changed, 791 insertions(+), 413 deletions(-) > > -- > Signature >