On 3/29/22 9:30 AM, J. Bruce Fields wrote:
On Tue, Mar 29, 2022 at 09:20:02AM -0700, dai.ngo@xxxxxxxxxx wrote:
On 3/29/22 8:47 AM, J. Bruce Fields wrote:
On Thu, Mar 24, 2022 at 09:34:42PM -0700, Dai Ngo wrote:
Update nfs4_client to add:
. cl_cs_client_state: courtesy client state
. cl_cs_lock: spinlock to synchronize access to cl_cs_client_state
. cl_cs_list: list used by laundromat to process courtesy clients
Modify alloc_client to initialize these fields.
Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
---
fs/nfsd/nfs4state.c | 2 ++
fs/nfsd/nfsd.h | 1 +
fs/nfsd/state.h | 33 +++++++++++++++++++++++++++++++++
3 files changed, 36 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 234e852fcdfa..a65d59510681 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2009,12 +2009,14 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
INIT_LIST_HEAD(&clp->cl_delegations);
INIT_LIST_HEAD(&clp->cl_lru);
INIT_LIST_HEAD(&clp->cl_revoked);
+ INIT_LIST_HEAD(&clp->cl_cs_list);
#ifdef CONFIG_NFSD_PNFS
INIT_LIST_HEAD(&clp->cl_lo_states);
#endif
INIT_LIST_HEAD(&clp->async_copies);
spin_lock_init(&clp->async_lock);
spin_lock_init(&clp->cl_lock);
+ spin_lock_init(&clp->cl_cs_lock);
rpc_init_wait_queue(&clp->cl_cb_waitq, "Backchannel slot table");
return clp;
err_no_hashtbl:
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 4fc1fd639527..23996c6ca75e 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -336,6 +336,7 @@ void nfsd_lockd_shutdown(void);
#define COMPOUND_ERR_SLACK_SPACE 16 /* OP_SETATTR */
#define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */
+#define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */
/*
* The following attributes are currently not supported by the NFSv4 server:
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 95457cfd37fc..40e390abc842 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -283,6 +283,35 @@ struct nfsd4_sessionid {
#define HEXDIR_LEN 33 /* hex version of 16 byte md5 of cl_name plus '\0' */
/*
+ * CLIENT_ CLIENT_ CLIENT_
+ * COURTESY EXPIRED RECONNECTED Meaning Where set
+ * -----------------------------------------------------------------------------
+ * | false | false | false | Confirmed, active | Default |
+ * |---------------------------------------------------------------------------|
+ * | true | false | false | Courtesy state. | nfs4_get_client_reaplist |
+ * | | | | Lease/lock/share | |
+ * | | | | reservation conflict | |
+ * | | | | can cause Courtesy | |
+ * | | | | client to be expired | |
+ * |---------------------------------------------------------------------------|
+ * | false | true | false | Courtesy client to be| nfs4_laundromat |
+ * | | | | expired by Laundromat| nfsd4_lm_lock_expired |
+ * | | | | due to conflict | nfsd4_discard_courtesy_clnt |
+ * | | | | | nfsd4_expire_courtesy_clnt |
+ * |---------------------------------------------------------------------------|
+ * | false | false | true | Courtesy client | nfsd4_courtesy_clnt_expired|
+ * | | | | reconnected, | |
+ * | | | | becoming active | |
+ * -----------------------------------------------------------------------------
By the way, where is a client returned to the normal (0) state? That
has to happen at some point.
For 4.1 courtesy client reconnects is detected in nfsd4_sequence,
nfsd4_bind_conn_to_session. For 4.0 courtesy client reconnects is
detected in set_client.
How is CLIENT_EXPIRED treated differently from cl_time == 0, and why?
cl_time == 0 means the client is being destroyed (almost) immediately
either by the laundromat or force_expire_client.
CLIENT_EXPIRED means the client will be destroyed by the laundromat
and this depends on when the laundromat runs. When we set CLIENT_COURTESY
we don't clear cl_time since the client is not really expired yet.
We could replace CLIENT_EXPIRED with (cl_time == 0). However,
to set cl_time = 0 we need to acquire the nn->client_lock which
causes deadlock when we try to resolve lock conflicts
from nfs4_resolve_deny_conflicts_locked (fp->fi_lock -> nn_clientlock).
We use the cl_cs_lock to set CLIENT_EXPIRED.
Why are RECONNECTED clients discarded in so many cases? (E.g. whenever
a bind_conn_to_session fails).
find_in_sessionid_hashtbl: we discard the courtesy client when it
reconnects and there is error from nfsd4_get_session_locked. This
should be a rare condition so rather than reverting the client
state back to courtesy, it is simpler just to discard it.
nfsd4_create_session/find_confirmed_client: I think the only time
the courtesy client sends CREATE_SESSION, before sending the SEQUENCE
to reconnect after missing its leases, is when it wants to do clientid
trunking. This should be a rare condition so instead of dealing
with it we just do not allow it and discard the client for now.
nfsd4_destroy_clientid/find_confirmed_client: instead of destroy
the courtesy client here we just let the laundromat destroy it
as if the client already expired.
nfsd4_setclientid_confirm/find_confirmed_client: there should not
be any courtesy client found from nfsd4_setclientid_confirm, it
should be detected and discarded in nfsd4_setclientid.
-Dai
These are mutually exclusive values, not bits that may set to 0 or 1, so
the three boolean columns are confusing. I'd just structure the table
like:
client state meaning where set
0 Confirmed, active Default
CLIENT_COURTESY Courtesy state.... nfs4_get_client_reaplist
CLIENT_EXPIRED Courtesy client to be.. nfs4_laundromat
etc.
will fix in v19.
Thanks,
-Dai
--b.
+ */
+
+enum courtesy_client_state {
+ NFSD4_CLIENT_COURTESY = 1,
+ NFSD4_CLIENT_EXPIRED,
+ NFSD4_CLIENT_RECONNECTED,
+};
+
+/*
* struct nfs4_client - one per client. Clientids live here.
*
* The initial object created by an NFS client using SETCLIENTID (for NFSv4.0)
@@ -385,6 +414,10 @@ struct nfs4_client {
struct list_head async_copies; /* list of async copies */
spinlock_t async_lock; /* lock for async copies */
atomic_t cl_cb_inflight; /* Outstanding callbacks */
+
+ enum courtesy_client_state cl_cs_client_state;
+ spinlock_t cl_cs_lock;
+ struct list_head cl_cs_list;
};
/* struct nfs4_client_reset
--
2.9.5