On Wed, 2021-07-14 at 11:50 -0400, Chuck Lever wrote: > In some rare failure modes, the server is actually reading the > transport, but then just dropping the requests on the floor. > TCP_USER_TIMEOUT cannot detect that case. > > Prevent such a stuck server from pinning client resources > indefinitely by ensuring that session and client ID clean-up can > time out even if the connection is still operational. > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > --- > fs/nfs/nfs4client.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > index 28431acd1230..c5032f784ac0 100644 > --- a/fs/nfs/nfs4client.c > +++ b/fs/nfs/nfs4client.c > @@ -281,6 +281,7 @@ static void nfs4_destroy_callback(struct > nfs_client *clp) > > static void nfs4_shutdown_client(struct nfs_client *clp) > { > + clp->cl_rpcclient->cl_noretranstimeo = 0; > if (__test_and_clear_bit(NFS_CS_RENEWD, &clp->cl_res_state)) > nfs4_kill_renewd(clp); > clp->cl_mvops->shutdown_client(clp); > > I can't see how this will help. Again, I suggest we rather turn off the retransmission default for the RPC calls where the server can drop stuff on the floor. That's really only the RPCSEC_GSS control messages. Anything else is covered by the NFSv4 blanket ban on dropping RPC calls. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx