Re: [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting

Daire Byrne <daire@xxxxxxxx> · Tue, 12 Sep 2023 10:40:04 +0100

Just to say, we see similar problems with NFSv3 servers re-exported to
NFSv3 clients.

In our case we have a single server re-exporting multiple NFSv3 remote
server mounts. If one of those re-exported mounts goes "bad" (network
loss, network congestion, server load), the knfsd threads slowly get
consumed by eager clients of that (hung) mount until there are no
threads left to serve the clients of all the other mounts/servers
being re-exported by that single server (that are still good).

The "softerr" mount option on the re-export server does not help with
this and the svc_rpc_per_connection_limit can make this much worse by
allowing a handful of clients to lock up all the knfsd threads very
quickly.

Even when the conditions of that "bad" server improve, there seems to
be a feedback loop of both the re-export servers retrans and the
clients of the re-export servers retrans that means many duplicate
lookups occur for a long time - it is often quicker to just reboot
that re-export server. Even worse, these duplicate lookups can
themselves cause high ops load on the original server and so the
requests timeout and retrans etc etc.

The only thing we have found to make this a little more bearable is to
increase the timeo (>30 mins) to minimise retrans and set the
svc_rpc_per_connection_limit=4. This at least reduces the chance that
a single re-export server that is serving multiple mounts can remain
responsive for all other mounts it serves. The other option would be
to just have a unique re-export server for a single mountpoint but
there are resource constraints when you have 30+ servers and mounts to
deal with.

We are still unable to use NFSv4 for our workloads because they often
involve high latency re-export servers 150+ms away and NFSv4 re-export
server performance is still limited by parallel metadata ops:

https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@xxxxxxxxxxxxxx/#t
https://bugzilla.linux-nfs.org/show_bug.cgi?id=375

Daire

On Mon, 11 Sept 2023 at 23:01, <trondmy@xxxxxxxxx> wrote:
>
> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>
> When re-exporting a NFSv4.x filesystem through knfsd, we want to ensure
> that the individual knfsd threads don't get stuck waiting for the server
> in a NFS4ERR_DELAY loop. While it may make sense to have the re-exported
> client retry a few times, particularly when the clients are using NFSv3,
> ultimately we want to just punt a EAGAIN back to knfsd, so that it can
> return NFS4ERR_DELAY/NFS3ERR_JUKEBOX, and free up the thread.
>
> With that in mind, add a client module parameter, 'delay_retrans', that
> specifies how many times a 'softerr' mounted NFSv4 filesystem should
> retry before returning EAGAIN.
> In order to avoid disrupting existing setups, the feature is disabled by
> default, however it can be enabled by specifying a positive value for
> the new parameter.
>
> Trond Myklebust (2):
>   NFSv4: Add a parameter to limit the number of retries after
>     NFS4ERR_DELAY
>   NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts
>
>  .../admin-guide/kernel-parameters.txt         |  7 +++
>  fs/nfs/nfs4_fs.h                              |  2 +
>  fs/nfs/nfs4proc.c                             | 43 +++++++++++++++----
>  fs/nfs/pnfs.c                                 |  8 +++-
>  fs/nfs/pnfs.h                                 |  5 ++-
>  fs/nfs/super.c                                |  8 +++-
>  fs/nfs/write.c                                |  2 +
>  7 files changed, 63 insertions(+), 12 deletions(-)
>
> --
> 2.41.0
>