On 4/1/22 8:57 AM, Chuck Lever III wrote:
On Apr 1, 2022, at 11:21 AM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
On Thu, Mar 31, 2022 at 09:02:04AM -0700, Dai Ngo wrote:
Update find_clp_in_name_tree to check and expire courtesy client.
Update find_confirmed_client_by_name to discard the courtesy
client by setting CLIENT_EXPIRED.
Update nfsd4_setclientid to expire client with CLIENT_EXPIRED
state to prevent multiple confirmed clients with the same name
on the conf_name_tree.
Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
---
fs/nfsd/nfs4state.c | 27 ++++++++++++++++++++++++---
fs/nfsd/state.h | 22 ++++++++++++++++++++++
2 files changed, 46 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fe8969ba94b3..eadce5d19473 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2893,8 +2893,11 @@ find_clp_in_name_tree(struct xdr_netobj *name, struct rb_root *root)
node = node->rb_left;
else if (cmp < 0)
node = node->rb_right;
- else
+ else {
+ if (nfsd4_courtesy_clnt_expired(clp))
+ return NULL;
return clp;
+ }
}
return NULL;
}
@@ -2973,8 +2976,15 @@ static bool clp_used_exchangeid(struct nfs4_client *clp)
static struct nfs4_client *
find_confirmed_client_by_name(struct xdr_netobj *name, struct nfsd_net *nn)
{
+ struct nfs4_client *clp;
+
lockdep_assert_held(&nn->client_lock);
- return find_clp_in_name_tree(name, &nn->conf_name_tree);
+ clp = find_clp_in_name_tree(name, &nn->conf_name_tree);
+ if (clp && clp->cl_cs_client_state == NFSD4_CLIENT_RECONNECTED) {
+ nfsd4_discard_courtesy_clnt(clp);
+ clp = NULL;
+ }
+ return clp;
}
....
+static inline void
+nfsd4_discard_courtesy_clnt(struct nfs4_client *clp)
+{
+ spin_lock(&clp->cl_cs_lock);
+ clp->cl_cs_client_state = NFSD4_CLIENT_EXPIRED;
+ spin_unlock(&clp->cl_cs_lock);
+}
This is a red flag to me.... What guarantees that the condition we just
checked (cl_cs_client_state == NFSD4_CLIENT_RECONNECTED) is still true
here? Why couldn't another thread have raced in and called
reactivate_courtesy_client?
Should we be holding cl_cs_lock across both the cl_cs_client_state and
the assignment?
Holding cl_cs_lock while checking cl_cs_client_state and then
updating it sounds OK to me.
Thanks Bruce and Chuck!
I replaced nfsd4_discard_courtesy_clnt with nfsd4_discard_reconnect_clnt
which checks and sets the cl_cs_client_state under the cl_cs_lock:
static inline bool
nfsd4_discard_reconnect_clnt(struct nfs4_client *clp)
{
bool ret = false;
spin_lock(&clp->cl_cs_lock);
if (clp->cl_cs_client_state == NFSD4_CLIENT_RECONNECTED) {
clp->cl_cs_client_state = NFSD4_CLIENT_EXPIRED;
ret = true;
}
spin_unlock(&clp->cl_cs_lock);
return ret;
}
Or should reactivate_courtesy_client be taking the
client_lock?
I'm still not clear on the need for the CLIENT_RECONNECTED state.
I think analysis would be a bit simpler if the only states were ACTIVE,
COURTESY, and EXPIRED, the only transitions possible were
ACTIVE->COURTESY->{EXPIRED or ACTIVE}
and the same lock ensured that those were the only possible transitions.
Yes, that would be easier, but I don't think it's realistic. There
are lock ordering issues between nn->client_lock and the locks on
the individual files and state that make it onerous.
(And to be honest I'd still prefer the original approach where we expire
clients from the posix locking code and then retry. It handles an
additional case (the one where reboot happens after a long network
partition), and I don't think it requires adding these new client
states....)
The locking of the earlier approach was unworkable.
But, I'm happy to consider that again if you can come up with a way
of handling it properly and simply.
I will wait for feedback from Bruce before sending v20 with the
above change.
-Dai
--
Chuck Lever