> On Apr 1, 2022, at 11:21 AM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Thu, Mar 31, 2022 at 09:02:04AM -0700, Dai Ngo wrote: >> Update find_clp_in_name_tree to check and expire courtesy client. >> >> Update find_confirmed_client_by_name to discard the courtesy >> client by setting CLIENT_EXPIRED. >> >> Update nfsd4_setclientid to expire client with CLIENT_EXPIRED >> state to prevent multiple confirmed clients with the same name >> on the conf_name_tree. >> >> Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> >> --- >> fs/nfsd/nfs4state.c | 27 ++++++++++++++++++++++++--- >> fs/nfsd/state.h | 22 ++++++++++++++++++++++ >> 2 files changed, 46 insertions(+), 3 deletions(-) >> >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >> index fe8969ba94b3..eadce5d19473 100644 >> --- a/fs/nfsd/nfs4state.c >> +++ b/fs/nfsd/nfs4state.c >> @@ -2893,8 +2893,11 @@ find_clp_in_name_tree(struct xdr_netobj *name, struct rb_root *root) >> node = node->rb_left; >> else if (cmp < 0) >> node = node->rb_right; >> - else >> + else { >> + if (nfsd4_courtesy_clnt_expired(clp)) >> + return NULL; >> return clp; >> + } >> } >> return NULL; >> } >> @@ -2973,8 +2976,15 @@ static bool clp_used_exchangeid(struct nfs4_client *clp) >> static struct nfs4_client * >> find_confirmed_client_by_name(struct xdr_netobj *name, struct nfsd_net *nn) >> { >> + struct nfs4_client *clp; >> + >> lockdep_assert_held(&nn->client_lock); >> - return find_clp_in_name_tree(name, &nn->conf_name_tree); >> + clp = find_clp_in_name_tree(name, &nn->conf_name_tree); >> + if (clp && clp->cl_cs_client_state == NFSD4_CLIENT_RECONNECTED) { >> + nfsd4_discard_courtesy_clnt(clp); >> + clp = NULL; >> + } >> + return clp; >> } >> > .... >> +static inline void >> +nfsd4_discard_courtesy_clnt(struct nfs4_client *clp) >> +{ >> + spin_lock(&clp->cl_cs_lock); >> + clp->cl_cs_client_state = NFSD4_CLIENT_EXPIRED; >> + spin_unlock(&clp->cl_cs_lock); >> +} > > This is a red flag to me.... What guarantees that the condition we just > checked (cl_cs_client_state == NFSD4_CLIENT_RECONNECTED) is still true > here? Why couldn't another thread have raced in and called > reactivate_courtesy_client? > > Should we be holding cl_cs_lock across both the cl_cs_client_state and > the assignment? Holding cl_cs_lock while checking cl_cs_client_state and then updating it sounds OK to me. > Or should reactivate_courtesy_client be taking the > client_lock? > > I'm still not clear on the need for the CLIENT_RECONNECTED state. > > I think analysis would be a bit simpler if the only states were ACTIVE, > COURTESY, and EXPIRED, the only transitions possible were > > ACTIVE->COURTESY->{EXPIRED or ACTIVE} > > and the same lock ensured that those were the only possible transitions. Yes, that would be easier, but I don't think it's realistic. There are lock ordering issues between nn->client_lock and the locks on the individual files and state that make it onerous. > (And to be honest I'd still prefer the original approach where we expire > clients from the posix locking code and then retry. It handles an > additional case (the one where reboot happens after a long network > partition), and I don't think it requires adding these new client > states....) The locking of the earlier approach was unworkable. But, I'm happy to consider that again if you can come up with a way of handling it properly and simply. -- Chuck Lever