On Thu, Mar 31, 2022 at 09:02:04AM -0700, Dai Ngo wrote: > Update find_clp_in_name_tree to check and expire courtesy client. > > Update find_confirmed_client_by_name to discard the courtesy > client by setting CLIENT_EXPIRED. > > Update nfsd4_setclientid to expire client with CLIENT_EXPIRED > state to prevent multiple confirmed clients with the same name > on the conf_name_tree. > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> > --- > fs/nfsd/nfs4state.c | 27 ++++++++++++++++++++++++--- > fs/nfsd/state.h | 22 ++++++++++++++++++++++ > 2 files changed, 46 insertions(+), 3 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index fe8969ba94b3..eadce5d19473 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -2893,8 +2893,11 @@ find_clp_in_name_tree(struct xdr_netobj *name, struct rb_root *root) > node = node->rb_left; > else if (cmp < 0) > node = node->rb_right; > - else > + else { > + if (nfsd4_courtesy_clnt_expired(clp)) > + return NULL; > return clp; > + } > } > return NULL; > } > @@ -2973,8 +2976,15 @@ static bool clp_used_exchangeid(struct nfs4_client *clp) > static struct nfs4_client * > find_confirmed_client_by_name(struct xdr_netobj *name, struct nfsd_net *nn) > { > + struct nfs4_client *clp; > + > lockdep_assert_held(&nn->client_lock); > - return find_clp_in_name_tree(name, &nn->conf_name_tree); > + clp = find_clp_in_name_tree(name, &nn->conf_name_tree); > + if (clp && clp->cl_cs_client_state == NFSD4_CLIENT_RECONNECTED) { > + nfsd4_discard_courtesy_clnt(clp); > + clp = NULL; > + } > + return clp; > } > .... > +static inline void > +nfsd4_discard_courtesy_clnt(struct nfs4_client *clp) > +{ > + spin_lock(&clp->cl_cs_lock); > + clp->cl_cs_client_state = NFSD4_CLIENT_EXPIRED; > + spin_unlock(&clp->cl_cs_lock); > +} This is a red flag to me.... What guarantees that the condition we just checked (cl_cs_client_state == NFSD4_CLIENT_RECONNECTED) is still true here? Why couldn't another thread have raced in and called reactivate_courtesy_client? Should we be holding cl_cs_lock across both the cl_cs_client_state and the assignment? Or should reactivate_courtesy_client be taking the client_lock? I'm still not clear on the need for the CLIENT_RECONNECTED state. I think analysis would be a bit simpler if the only states were ACTIVE, COURTESY, and EXPIRED, the only transitions possible were ACTIVE->COURTESY->{EXPIRED or ACTIVE} and the same lock ensured that those were the only possible transitions. (And to be honest I'd still prefer the original approach where we expire clients from the posix locking code and then retry. It handles an additional case (the one where reboot happens after a long network partition), and I don't think it requires adding these new client states....) --b.