Re: mount options not propagating to NFSACL and NSM RPC clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-04-11 11:20:14, Benjamin Coddington wrote:
> > So looks like the request is not being retransmitted. Just to be sure,
> > if I cause the nfsd to drop the regular NFS3 prog I/Os like ACCESS and
> > LOOKUP, I only get the expected 5 seconds delay following a successful
> > retry.
> >
> > Seems we only have an issue with the NFS3ACL prog.
> 
> It looks like the client_acl program gets created with
> rpc_bind_new_program() which doesn't setup the timeouts/retry strategy, and
> there's nothing after the setup to do it either.
> 
> I think the problem has existed since 331702337f2b2..  I think this should
> fix it up, would you like to test it?

Please allow me to propose a different change, which I already tested.

Looks to me that the 2012 change that I mentioned below is actually when
the problem started happening, when the `kmemdup` call was removed, so
since then we are simply just missing a field copy, and we are taking
the timeout from transport-based default timeout globals instead.

Also, I have a hunch that we need to do something different, because of
the following code in `rpc_new_client`:

    clnt->cl_rtt = &clnt->cl_rtt_default;
    rpc_init_rtt(&clnt->cl_rtt_default, clnt->cl_timeout->to_initval);

Only setting `clnt->cl_timeout` _after_ client clone does not seem to be
right if there are `cl_rtt_default` calculations that depend on it. So
may as well need to call `rpc_init_rtt` too.

--

>From 55737f82a9bb3e490836d10491995c8082ebcf11 Mon Sep 17 00:00:00 2001
From: Dan Aloni <dan.aloni@xxxxxxxxxxxx>
Date: Thu, 11 Apr 2024 18:30:56 +0300
Subject: [PATCH] sunrpc: fix NFSACL RPC retry on soft mount

It used to be quite awhile ago since 1b63a75180c6 ('SUNRPC: Refactor
rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that
all mount parameters propagate to NFSACL clients. However since that
change, if mount options as follows are given:

    soft,timeo=50,retrans=16,vers=3

The resultant NFSACL client receives:

    cl_softrtry: 1
    cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0

These values lead to NFSACL operations not being retried under the
condition of transient network outages with soft mount. Instead, getacl
call fails after 60 seconds with EIO.

The simple fix is to copy `cl_timeout` and make sure `cl_rtt_default`
is initialized from it.

Cc: Chuck Lever <chuck.lever@xxxxxxxxxx>
Cc: Benjamin Coddington <bcodding@xxxxxxxxxx>
Link: https://lore.kernel.org/all/20231105154857.ryakhmgaptq3hb6b@xxxxxxxxx/T/
Fixes: 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()')
Signed-off-by: Dan Aloni <dan.aloni@xxxxxxxxxxxx>
---
 net/sunrpc/clnt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index cda0935a68c9..75faf1f05a14 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -669,6 +669,9 @@ static struct rpc_clnt *__rpc_clone_client(struct rpc_create_args *args,
 	new->cl_chatty = clnt->cl_chatty;
 	new->cl_principal = clnt->cl_principal;
 	new->cl_max_connect = clnt->cl_max_connect;
+	new->cl_timeout = clnt->cl_timeout;
+	rpc_init_rtt(&clnt->cl_rtt_default, clnt->cl_timeout->to_initval);
+
 	return new;
 
 out_err:
-- 
2.39.3


-- 
Dan Aloni




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux