Just to say, we see similar problems with NFSv3 servers re-exported to NFSv3 clients. In our case we have a single server re-exporting multiple NFSv3 remote server mounts. If one of those re-exported mounts goes "bad" (network loss, network congestion, server load), the knfsd threads slowly get consumed by eager clients of that (hung) mount until there are no threads left to serve the clients of all the other mounts/servers being re-exported by that single server (that are still good). The "softerr" mount option on the re-export server does not help with this and the svc_rpc_per_connection_limit can make this much worse by allowing a handful of clients to lock up all the knfsd threads very quickly. Even when the conditions of that "bad" server improve, there seems to be a feedback loop of both the re-export servers retrans and the clients of the re-export servers retrans that means many duplicate lookups occur for a long time - it is often quicker to just reboot that re-export server. Even worse, these duplicate lookups can themselves cause high ops load on the original server and so the requests timeout and retrans etc etc. The only thing we have found to make this a little more bearable is to increase the timeo (>30 mins) to minimise retrans and set the svc_rpc_per_connection_limit=4. This at least reduces the chance that a single re-export server that is serving multiple mounts can remain responsive for all other mounts it serves. The other option would be to just have a unique re-export server for a single mountpoint but there are resource constraints when you have 30+ servers and mounts to deal with. We are still unable to use NFSv4 for our workloads because they often involve high latency re-export servers 150+ms away and NFSv4 re-export server performance is still limited by parallel metadata ops: https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@xxxxxxxxxxxxxx/#t https://bugzilla.linux-nfs.org/show_bug.cgi?id=375 Daire On Mon, 11 Sept 2023 at 23:01, <trondmy@xxxxxxxxx> wrote: > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > When re-exporting a NFSv4.x filesystem through knfsd, we want to ensure > that the individual knfsd threads don't get stuck waiting for the server > in a NFS4ERR_DELAY loop. While it may make sense to have the re-exported > client retry a few times, particularly when the clients are using NFSv3, > ultimately we want to just punt a EAGAIN back to knfsd, so that it can > return NFS4ERR_DELAY/NFS3ERR_JUKEBOX, and free up the thread. > > With that in mind, add a client module parameter, 'delay_retrans', that > specifies how many times a 'softerr' mounted NFSv4 filesystem should > retry before returning EAGAIN. > In order to avoid disrupting existing setups, the feature is disabled by > default, however it can be enabled by specifying a positive value for > the new parameter. > > Trond Myklebust (2): > NFSv4: Add a parameter to limit the number of retries after > NFS4ERR_DELAY > NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts > > .../admin-guide/kernel-parameters.txt | 7 +++ > fs/nfs/nfs4_fs.h | 2 + > fs/nfs/nfs4proc.c | 43 +++++++++++++++---- > fs/nfs/pnfs.c | 8 +++- > fs/nfs/pnfs.h | 5 ++- > fs/nfs/super.c | 8 +++- > fs/nfs/write.c | 2 + > 7 files changed, 63 insertions(+), 12 deletions(-) > > -- > 2.41.0 >