> On Mar 29, 2022, at 2:39 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Tue, Mar 29, 2022 at 11:19:51AM -0700, dai.ngo@xxxxxxxxxx wrote: >> >> On 3/29/22 9:30 AM, J. Bruce Fields wrote: >>> On Tue, Mar 29, 2022 at 09:20:02AM -0700, dai.ngo@xxxxxxxxxx wrote: >>>> On 3/29/22 8:47 AM, J. Bruce Fields wrote: >>>>> On Thu, Mar 24, 2022 at 09:34:42PM -0700, Dai Ngo wrote: >>>>>> Update nfs4_client to add: >>>>>> . cl_cs_client_state: courtesy client state >>>>>> . cl_cs_lock: spinlock to synchronize access to cl_cs_client_state >>>>>> . cl_cs_list: list used by laundromat to process courtesy clients >>>>>> >>>>>> Modify alloc_client to initialize these fields. >>>>>> >>>>>> Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> >>>>>> --- >>>>>> fs/nfsd/nfs4state.c | 2 ++ >>>>>> fs/nfsd/nfsd.h | 1 + >>>>>> fs/nfsd/state.h | 33 +++++++++++++++++++++++++++++++++ >>>>>> 3 files changed, 36 insertions(+) >>>>>> >>>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>>>> index 234e852fcdfa..a65d59510681 100644 >>>>>> --- a/fs/nfsd/nfs4state.c >>>>>> +++ b/fs/nfsd/nfs4state.c >>>>>> @@ -2009,12 +2009,14 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name) >>>>>> INIT_LIST_HEAD(&clp->cl_delegations); >>>>>> INIT_LIST_HEAD(&clp->cl_lru); >>>>>> INIT_LIST_HEAD(&clp->cl_revoked); >>>>>> + INIT_LIST_HEAD(&clp->cl_cs_list); >>>>>> #ifdef CONFIG_NFSD_PNFS >>>>>> INIT_LIST_HEAD(&clp->cl_lo_states); >>>>>> #endif >>>>>> INIT_LIST_HEAD(&clp->async_copies); >>>>>> spin_lock_init(&clp->async_lock); >>>>>> spin_lock_init(&clp->cl_lock); >>>>>> + spin_lock_init(&clp->cl_cs_lock); >>>>>> rpc_init_wait_queue(&clp->cl_cb_waitq, "Backchannel slot table"); >>>>>> return clp; >>>>>> err_no_hashtbl: >>>>>> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h >>>>>> index 4fc1fd639527..23996c6ca75e 100644 >>>>>> --- a/fs/nfsd/nfsd.h >>>>>> +++ b/fs/nfsd/nfsd.h >>>>>> @@ -336,6 +336,7 @@ void nfsd_lockd_shutdown(void); >>>>>> #define COMPOUND_ERR_SLACK_SPACE 16 /* OP_SETATTR */ >>>>>> #define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */ >>>>>> +#define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */ >>>>>> /* >>>>>> * The following attributes are currently not supported by the NFSv4 server: >>>>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h >>>>>> index 95457cfd37fc..40e390abc842 100644 >>>>>> --- a/fs/nfsd/state.h >>>>>> +++ b/fs/nfsd/state.h >>>>>> @@ -283,6 +283,35 @@ struct nfsd4_sessionid { >>>>>> #define HEXDIR_LEN 33 /* hex version of 16 byte md5 of cl_name plus '\0' */ >>>>>> /* >>>>>> + * CLIENT_ CLIENT_ CLIENT_ >>>>>> + * COURTESY EXPIRED RECONNECTED Meaning Where set >>>>>> + * ----------------------------------------------------------------------------- >>>>>> + * | false | false | false | Confirmed, active | Default | >>>>>> + * |---------------------------------------------------------------------------| >>>>>> + * | true | false | false | Courtesy state. | nfs4_get_client_reaplist | >>>>>> + * | | | | Lease/lock/share | | >>>>>> + * | | | | reservation conflict | | >>>>>> + * | | | | can cause Courtesy | | >>>>>> + * | | | | client to be expired | | >>>>>> + * |---------------------------------------------------------------------------| >>>>>> + * | false | true | false | Courtesy client to be| nfs4_laundromat | >>>>>> + * | | | | expired by Laundromat| nfsd4_lm_lock_expired | >>>>>> + * | | | | due to conflict | nfsd4_discard_courtesy_clnt | >>>>>> + * | | | | | nfsd4_expire_courtesy_clnt | >>>>>> + * |---------------------------------------------------------------------------| >>>>>> + * | false | false | true | Courtesy client | nfsd4_courtesy_clnt_expired| >>>>>> + * | | | | reconnected, | | >>>>>> + * | | | | becoming active | | >>>>>> + * ----------------------------------------------------------------------------- >>> By the way, where is a client returned to the normal (0) state? That >>> has to happen at some point. >> >> For 4.1 courtesy client reconnects is detected in nfsd4_sequence, >> nfsd4_bind_conn_to_session. > > Those are the places where NFSD54_CLIENT_RECONNECTED is set, which isn't > the question I asked. "reconnected" simply means the client has gotten back in touch. The server then has to decide whether to allow the client to become active again or it needs to purge it. That decision is different for each operation and minor version. Look for "if (cl_cs_client_state == NFSD4_CLIENT_RECONNECTED)" for how those choices are made. >>> Why are RECONNECTED clients discarded in so many cases? (E.g. whenever >>> a bind_conn_to_session fails). >> >> find_in_sessionid_hashtbl: we discard the courtesy client when it >> reconnects and there is error from nfsd4_get_session_locked. This >> should be a rare condition so rather than reverting the client >> state back to courtesy, it is simpler just to discard it. > > That may be a rare situation, but I don't believe the behavior of > discarding the client in this case is correct. Can you explain this? It's a courtesy client... the server can decide it's expired at that point, can't it? IOW what breaks? >> nfsd4_create_session/find_confirmed_client: I think the only time >> the courtesy client sends CREATE_SESSION, before sending the SEQUENCE >> to reconnect after missing its leases, is when it wants to do clientid >> trunking. This should be a rare condition so instead of dealing >> with it we just do not allow it and discard the client for now. > > We can't wave away incorrect behavior with "but it's rare". Users with > heavy and/or unusual workloads hit rare conditions. Clients may change > their behavior over time. (E.g., trunking may become more common.) -- Chuck Lever