On Sun, 19 Jan 2025, Chuck Lever wrote: > On 12/11/24 4:47 PM, NeilBrown wrote: > > Reducing the number of slots in the session slot table requires > > confirmation from the client. This patch adds reduce_session_slots() > > which starts the process of getting confirmation, but never calls it. > > That will come in a later patch. > > > > Before we can free a slot we need to confirm that the client won't try > > to use it again. This involves returning a lower cr_maxrequests in a > > SEQUENCE reply and then seeing a ca_maxrequests on the same slot which > > is not larger than we limit we are trying to impose. So for each slot > > we need to remember that we have sent a reduced cr_maxrequests. > > > > To achieve this we introduce a concept of request "generations". Each > > time we decide to reduce cr_maxrequests we increment the generation > > number, and record this when we return the lower cr_maxrequests to the > > client. When a slot with the current generation reports a low > > ca_maxrequests, we commit to that level and free extra slots. > > > > We use an 16 bit generation number (64 seems wasteful) and if it cycles > > we iterate all slots and reset the generation number to avoid false matches. > > > > When we free a slot we store the seqid in the slot pointer so that it can > > be restored when we reactivate the slot. The RFC can be read as > > suggesting that the slot number could restart from one after a slot is > > retired and reactivated, but also suggests that retiring slots is not > > required. So when we reactive a slot we accept with the next seqid in > > sequence, or 1. > > > > When decoding sa_highest_slotid into maxslots we need to add 1 - this > > matches how it is encoded for the reply. > > > > se_dead is moved in struct nfsd4_session to remove a hole. > > > > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > fs/nfsd/nfs4state.c | 94 ++++++++++++++++++++++++++++++++++++++++----- > > fs/nfsd/nfs4xdr.c | 5 ++- > > fs/nfsd/state.h | 6 ++- > > fs/nfsd/xdr4.h | 2 - > > 4 files changed, 92 insertions(+), 15 deletions(-) > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index fd9473d487f3..a2d1f97b8a0e 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -1910,17 +1910,69 @@ gen_sessionid(struct nfsd4_session *ses) > > #define NFSD_MIN_HDR_SEQ_SZ (24 + 12 + 44) > > > > static void > > -free_session_slots(struct nfsd4_session *ses) > > +free_session_slots(struct nfsd4_session *ses, int from) > > { > > int i; > > > > - for (i = 0; i < ses->se_fchannel.maxreqs; i++) { > > + if (from >= ses->se_fchannel.maxreqs) > > + return; > > + > > + for (i = from; i < ses->se_fchannel.maxreqs; i++) { > > struct nfsd4_slot *slot = xa_load(&ses->se_slots, i); > > > > - xa_erase(&ses->se_slots, i); > > + /* > > + * Save the seqid in case we reactivate this slot. > > + * This will never require a memory allocation so GFP > > + * flag is irrelevant > > + */ > > + xa_store(&ses->se_slots, i, xa_mk_value(slot->sl_seqid), 0); > > free_svc_cred(&slot->sl_cred); > > kfree(slot); > > } > > + ses->se_fchannel.maxreqs = from; > > + if (ses->se_target_maxslots > from) > > + ses->se_target_maxslots = from; > > +} > > + > > +/** > > + * reduce_session_slots - reduce the target max-slots of a session if possible > > + * @ses: The session to affect > > + * @dec: how much to decrease the target by > > + * > > + * This interface can be used by a shrinker to reduce the target max-slots > > + * for a session so that some slots can eventually be freed. > > + * It uses spin_trylock() as it may be called in a context where another > > + * spinlock is held that has a dependency on client_lock. As shrinkers are > > + * best-effort, skiping a session is client_lock is already held has no > > + * great coast > > + * > > + * Return value: > > + * The number of slots that the target was reduced by. > > + */ > > +static int __maybe_unused > > +reduce_session_slots(struct nfsd4_session *ses, int dec) > > +{ > > + struct nfsd_net *nn = net_generic(ses->se_client->net, > > + nfsd_net_id); > > + int ret = 0; > > + > > + if (ses->se_target_maxslots <= 1) > > + return ret; > > + if (!spin_trylock(&nn->client_lock)) > > + return ret; > > + ret = min(dec, ses->se_target_maxslots-1); > > + ses->se_target_maxslots -= ret; > > + ses->se_slot_gen += 1; > > + if (ses->se_slot_gen == 0) { > > + int i; > > + ses->se_slot_gen = 1; > > + for (i = 0; i < ses->se_fchannel.maxreqs; i++) { > > + struct nfsd4_slot *slot = xa_load(&ses->se_slots, i); > > + slot->sl_generation = 0; > > + } > > + } > > + spin_unlock(&nn->client_lock); > > + return ret; > > } > > > > /* > > @@ -1968,6 +2020,7 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs, > > } > > fattrs->maxreqs = i; > > memcpy(&new->se_fchannel, fattrs, sizeof(struct nfsd4_channel_attrs)); > > + new->se_target_maxslots = i; > > new->se_cb_slot_avail = ~0U; > > new->se_cb_highest_slot = min(battrs->maxreqs - 1, > > NFSD_BC_SLOT_TABLE_SIZE - 1); > > @@ -2081,7 +2134,7 @@ static void nfsd4_del_conns(struct nfsd4_session *s) > > > > static void __free_session(struct nfsd4_session *ses) > > { > > - free_session_slots(ses); > > + free_session_slots(ses, 0); > > xa_destroy(&ses->se_slots); > > kfree(ses); > > } > > @@ -2684,6 +2737,9 @@ static int client_info_show(struct seq_file *m, void *v) > > seq_printf(m, "session slots:"); > > list_for_each_entry(ses, &clp->cl_sessions, se_perclnt) > > seq_printf(m, " %u", ses->se_fchannel.maxreqs); > > + seq_printf(m, "\nsession target slots:"); > > + list_for_each_entry(ses, &clp->cl_sessions, se_perclnt) > > + seq_printf(m, " %u", ses->se_target_maxslots); > > spin_unlock(&clp->cl_lock); > > seq_puts(m, "\n"); > > > > @@ -3674,10 +3730,10 @@ nfsd4_exchange_id_release(union nfsd4_op_u *u) > > kfree(exid->server_impl_name); > > } > > > > -static __be32 check_slot_seqid(u32 seqid, u32 slot_seqid, bool slot_inuse) > > +static __be32 check_slot_seqid(u32 seqid, u32 slot_seqid, u8 flags) > > { > > /* The slot is in use, and no response has been sent. */ > > - if (slot_inuse) { > > + if (flags & NFSD4_SLOT_INUSE) { > > if (seqid == slot_seqid) > > return nfserr_jukebox; > > else > > @@ -3686,6 +3742,8 @@ static __be32 check_slot_seqid(u32 seqid, u32 slot_seqid, bool slot_inuse) > > /* Note unsigned 32-bit arithmetic handles wraparound: */ > > if (likely(seqid == slot_seqid + 1)) > > return nfs_ok; > > + if ((flags & NFSD4_SLOT_REUSED) && seqid == 1) > > + return nfs_ok; > > if (seqid == slot_seqid) > > return nfserr_replay_cache; > > return nfserr_seq_misordered; > > @@ -4236,8 +4294,7 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > dprintk("%s: slotid %d\n", __func__, seq->slotid); > > > > trace_nfsd_slot_seqid_sequence(clp, seq, slot); > > - status = check_slot_seqid(seq->seqid, slot->sl_seqid, > > - slot->sl_flags & NFSD4_SLOT_INUSE); > > + status = check_slot_seqid(seq->seqid, slot->sl_seqid, slot->sl_flags); > > if (status == nfserr_replay_cache) { > > status = nfserr_seq_misordered; > > if (!(slot->sl_flags & NFSD4_SLOT_INITIALIZED)) > > @@ -4262,6 +4319,12 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > if (status) > > goto out_put_session; > > > > + if (session->se_target_maxslots < session->se_fchannel.maxreqs && > > + slot->sl_generation == session->se_slot_gen && > > + seq->maxslots <= session->se_target_maxslots) > > + /* Client acknowledged our reduce maxreqs */ > > + free_session_slots(session, session->se_target_maxslots); > > + > > buflen = (seq->cachethis) ? > > session->se_fchannel.maxresp_cached : > > session->se_fchannel.maxresp_sz; > > @@ -4272,9 +4335,11 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > svc_reserve(rqstp, buflen); > > > > status = nfs_ok; > > - /* Success! bump slot seqid */ > > + /* Success! accept new slot seqid */ > > slot->sl_seqid = seq->seqid; > > + slot->sl_flags &= ~NFSD4_SLOT_REUSED; > > slot->sl_flags |= NFSD4_SLOT_INUSE; > > + slot->sl_generation = session->se_slot_gen; > > if (seq->cachethis) > > slot->sl_flags |= NFSD4_SLOT_CACHETHIS; > > else > > @@ -4291,9 +4356,11 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > * the client might use. > > */ > > if (seq->slotid == session->se_fchannel.maxreqs - 1 && > > + session->se_target_maxslots >= session->se_fchannel.maxreqs && > > session->se_fchannel.maxreqs < NFSD_MAX_SLOTS_PER_SESSION) { > > int s = session->se_fchannel.maxreqs; > > int cnt = DIV_ROUND_UP(s, 5); > > + void *prev_slot; > > > > do { > > /* > > @@ -4303,18 +4370,25 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > */ > > slot = kzalloc(slot_bytes(&session->se_fchannel), > > GFP_NOWAIT); > > + prev_slot = xa_load(&session->se_slots, s); > > + if (xa_is_value(prev_slot) && slot) { > > + slot->sl_seqid = xa_to_value(prev_slot); > > + slot->sl_flags |= NFSD4_SLOT_REUSED; > > + } > > if (slot && > > !xa_is_err(xa_store(&session->se_slots, s, slot, > > GFP_NOWAIT))) { > > s += 1; > > session->se_fchannel.maxreqs = s; > > + session->se_target_maxslots = s; > > } else { > > kfree(slot); > > slot = NULL; > > } > > } while (slot && --cnt > 0); > > } > > - seq->maxslots = session->se_fchannel.maxreqs; > > + seq->maxslots = max(session->se_target_maxslots, seq->maxslots); > > + seq->target_maxslots = session->se_target_maxslots; > > > > out: > > switch (clp->cl_cb_state) { > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > index 53fac037611c..4dcb03cd9292 100644 > > --- a/fs/nfsd/nfs4xdr.c > > +++ b/fs/nfsd/nfs4xdr.c > > @@ -1884,7 +1884,8 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, > > return nfserr_bad_xdr; > > seq->seqid = be32_to_cpup(p++); > > seq->slotid = be32_to_cpup(p++); > > - seq->maxslots = be32_to_cpup(p++); > > + /* sa_highest_slotid counts from 0 but maxslots counts from 1 ... */ > > + seq->maxslots = be32_to_cpup(p++) + 1; > > seq->cachethis = be32_to_cpup(p); > > > > seq->status_flags = 0; > > @@ -4968,7 +4969,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, > > if (nfserr != nfs_ok) > > return nfserr; > > /* sr_target_highest_slotid */ > > - nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1); > > + nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1); > > if (nfserr != nfs_ok) > > return nfserr; > > /* sr_status_flags */ > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > > index aad547d3ad8b..4251ff3c5ad1 100644 > > --- a/fs/nfsd/state.h > > +++ b/fs/nfsd/state.h > > @@ -245,10 +245,12 @@ struct nfsd4_slot { > > struct svc_cred sl_cred; > > u32 sl_datalen; > > u16 sl_opcnt; > > + u16 sl_generation; > > #define NFSD4_SLOT_INUSE (1 << 0) > > #define NFSD4_SLOT_CACHETHIS (1 << 1) > > #define NFSD4_SLOT_INITIALIZED (1 << 2) > > #define NFSD4_SLOT_CACHED (1 << 3) > > +#define NFSD4_SLOT_REUSED (1 << 4) > > u8 sl_flags; > > char sl_data[]; > > }; > > @@ -321,7 +323,6 @@ struct nfsd4_session { > > u32 se_cb_slot_avail; /* bitmap of available slots */ > > u32 se_cb_highest_slot; /* highest slot client wants */ > > u32 se_cb_prog; > > - bool se_dead; > > struct list_head se_hash; /* hash by sessionid */ > > struct list_head se_perclnt; > > struct nfs4_client *se_client; > > @@ -331,6 +332,9 @@ struct nfsd4_session { > > struct list_head se_conns; > > u32 se_cb_seq_nr[NFSD_BC_SLOT_TABLE_SIZE]; > > struct xarray se_slots; /* forward channel slots */ > > + u16 se_slot_gen; > > + bool se_dead; > > + u32 se_target_maxslots; > > }; > > > > /* formatted contents of nfs4_sessionid */ > > diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h > > index 382cc1389396..c26ba86dbdfd 100644 > > --- a/fs/nfsd/xdr4.h > > +++ b/fs/nfsd/xdr4.h > > @@ -576,9 +576,7 @@ struct nfsd4_sequence { > > u32 slotid; /* request/response */ > > u32 maxslots; /* request/response */ > > u32 cachethis; /* request */ > > -#if 0 > > u32 target_maxslots; /* response */ > > -#endif /* not yet */ > > u32 status_flags; /* response */ > > }; > > > > Hi Neil - > > I've found some misbehavior which I've bisected to this commit. > > If disconnect injection is set up to break the connection every 25,000 > RPCs or so, xfstests running on an NFSv4.1 mount will eventually stall > after this commit is applied. > > Network capture shows that the server eventually starts returning > SEQ_MISORDERED because the client has forgotten an retired slot after a > disconnect, and tries to use sequence number 1 for that slot with a new > operation. > > I've narrowed the issue down to nfs41_is_outlier_target_slotid() on the > client. This function uses a bit of calculus to decide when to bump the > slot table's generation number. In the failing case, it never bumps the > generation, and that results in the client freeing slots it shouldn't. > The server's reported { highest, target_highest } slot numbers don't > appear to change correctly after the client has reconnected. > > If I revert this hunk from 5/6: > > @@ -4968,7 +4969,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres > *resp, __be32 nfserr, > if (nfserr != nfs_ok) > return nfserr; > /* sr_target_highest_slotid */ > - nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1); > + nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1); > if (nfserr != nfs_ok) > return nfserr; > /* sr_status_flags */ > > the reproducer above runs to completion in the expected amount of time. > > > The high order bit here is whether I should drop these patches for > v6.14, or whether you believe you can come up with a narrow solution > during the early part of v6.14-rc that I can include in an -rc update > for NFSD. I can't really tell if a significant amount of surgery will > be necessary. > > What do you think? I think I can fix it. It sounds like it might be a client bug, or it might be a difference in interpretation of the spec, or it might be an ambiguity in the spec. But I think I can fix it by setting NFSD_SLOT_REUSED in any slots beyond ->target_maxslots when I reduce target_maxslots. That makes it so that we accept either 1 or the next-in-sequence seqid. Doing that does open up a possible risk of a resend being accepted as a new request. As it is a new connection, resends are a real possibility. I'll have to ponder that a bit more and might need to be careful about retiring slots with a low seqid which have been used recently. Maybe I need to store the time when seqid 1 was used, and not return a slot until some minimum time after that timestamp. I'll try to get you a patch before the end of next week. Thanks, NeilBrown