Re: [PATCH 5/6] nfsd: add support for freeing unused session-DRC slots

"NeilBrown" <neilb@xxxxxxx> · Tue, 03 Dec 2024 15:28:33 +1100

On Tue, 03 Dec 2024, Chuck Lever III wrote:
> 
> 
> > On Nov 21, 2024, at 5:29 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> > 
> > 
> > 
> >> On Nov 21, 2024, at 4:47 PM, NeilBrown <neilb@xxxxxxx> wrote:
> >> 
> >> On Wed, 20 Nov 2024, Chuck Lever wrote:
> >>> On Wed, Nov 20, 2024 at 09:35:00AM +1100, NeilBrown wrote:
> >>>> On Wed, 20 Nov 2024, Chuck Lever wrote:
> >>>>> On Tue, Nov 19, 2024 at 11:41:32AM +1100, NeilBrown wrote:
> >>>>>> Reducing the number of slots in the session slot table requires
> >>>>>> confirmation from the client.  This patch adds reduce_session_slots()
> >>>>>> which starts the process of getting confirmation, but never calls it.
> >>>>>> That will come in a later patch.
> >>>>>> 
> >>>>>> Before we can free a slot we need to confirm that the client won't try
> >>>>>> to use it again.  This involves returning a lower cr_maxrequests in a
> >>>>>> SEQUENCE reply and then seeing a ca_maxrequests on the same slot which
> >>>>>> is not larger than we limit we are trying to impose.  So for each slot
> >>>>>> we need to remember that we have sent a reduced cr_maxrequests.
> >>>>>> 
> >>>>>> To achieve this we introduce a concept of request "generations".  Each
> >>>>>> time we decide to reduce cr_maxrequests we increment the generation
> >>>>>> number, and record this when we return the lower cr_maxrequests to the
> >>>>>> client.  When a slot with the current generation reports a low
> >>>>>> ca_maxrequests, we commit to that level and free extra slots.
> >>>>>> 
> >>>>>> We use an 8 bit generation number (64 seems wasteful) and if it cycles
> >>>>>> we iterate all slots and reset the generation number to avoid false matches.
> >>>>>> 
> >>>>>> When we free a slot we store the seqid in the slot pointer so that it can
> >>>>>> be restored when we reactivate the slot.  The RFC can be read as
> >>>>>> suggesting that the slot number could restart from one after a slot is
> >>>>>> retired and reactivated, but also suggests that retiring slots is not
> >>>>>> required.  So when we reactive a slot we accept with the next seqid in
> >>>>>> sequence, or 1.
> >>>>>> 
> >>>>>> When decoding sa_highest_slotid into maxslots we need to add 1 - this
> >>>>>> matches how it is encoded for the reply.
> >>>>>> 
> >>>>>> Signed-off-by: NeilBrown <neilb@xxxxxxx>
> >>>>>> ---
> >>>>>> fs/nfsd/nfs4state.c | 81 ++++++++++++++++++++++++++++++++++++++-------
> >>>>>> fs/nfsd/nfs4xdr.c   |  5 +--
> >>>>>> fs/nfsd/state.h     |  4 +++
> >>>>>> fs/nfsd/xdr4.h      |  2 --
> >>>>>> 4 files changed, 76 insertions(+), 16 deletions(-)
> >>>>>> 
> >>>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> >>>>>> index fb522165b376..0625b0aec6b8 100644
> >>>>>> --- a/fs/nfsd/nfs4state.c
> >>>>>> +++ b/fs/nfsd/nfs4state.c
> >>>>>> @@ -1910,17 +1910,55 @@ gen_sessionid(struct nfsd4_session *ses)
> >>>>>> #define NFSD_MIN_HDR_SEQ_SZ  (24 + 12 + 44)
> >>>>>> 
> >>>>>> static void
> >>>>>> -free_session_slots(struct nfsd4_session *ses)
> >>>>>> +free_session_slots(struct nfsd4_session *ses, int from)
> >>>>>> {
> >>>>>> int i;
> >>>>>> 
> >>>>>> - for (i = 0; i < ses->se_fchannel.maxreqs; i++) {
> >>>>>> + if (from >= ses->se_fchannel.maxreqs)
> >>>>>> + return;
> >>>>>> +
> >>>>>> + for (i = from; i < ses->se_fchannel.maxreqs; i++) {
> >>>>>> struct nfsd4_slot *slot = xa_load(&ses->se_slots, i);
> >>>>>> 
> >>>>>> - xa_erase(&ses->se_slots, i);
> >>>>>> + /*
> >>>>>> +  * Save the seqid in case we reactivate this slot.
> >>>>>> +  * This will never require a memory allocation so GFP
> >>>>>> +  * flag is irrelevant
> >>>>>> +  */
> >>>>>> + xa_store(&ses->se_slots, i, xa_mk_value(slot->sl_seqid),
> >>>>>> +  GFP_ATOMIC);
> >>>>> 
> >>>>> Again... ATOMIC is probably not what we want here, even if it is
> >>>>> only documentary.
> >>>> 
> >>>> Why not?  It might be called under a spinlock so GFP_KERNEL might trigger
> >>>> a warning.
> >>> 
> >>> I find using GFP_ATOMIC here to be confusing -- it requests
> >>> allocation from special memory reserves and is to be used in
> >>> situations where allocation might result in system failure. That is
> >>> clearly not the case here, and the resulting memory allocation might
> >>> be long-lived.
> >> 
> >> Would you be comfortable with GFP_NOWAIT which leaves out __GFP_HIGH ??
> > 
> > I will be comfortable when I hear back from Matthew and Liam.
> > 
> > :-)
> > 
> > 
> >>> I see the comment that says memory won't actually be allocated. I'm
> >>> not sure that's the way xa_store() works, however.
> >> 
> >> xarray.rst says:
> >> 
> >> The xa_store(), xa_cmpxchg(), xa_alloc(),
> >> xa_reserve() and xa_insert() functions take a gfp_t
> >> parameter in case the XArray needs to allocate memory to store this entry.
> >> If the entry is being deleted, no memory allocation needs to be performed,
> >> and the GFP flags specified will be ignored.`
> >> 
> >> The particular context is that a normal pointer is currently stored a
> >> the given index, and we are replacing that with a number.  The above
> >> doesn't explicitly say that won't require a memory allocation, but I
> >> think it is reasonable to say it won't need "to allocate memory to store
> >> this entry" - as an entry is already stored - so allocation should not
> >> be needed.
> > 
> > xa_mk_value() converts a number to a pointer, and xa_store
> > stores that pointer.
> > 
> > I suspect that xa_store() is allowed to rebalance the
> > xarray's internal data structures, and that could result
> > in memory release or allocation. That's why a GFP flag is
> > one of the arguments.
> 
> Matthew says the xa_store() is guaranteed not to do a memory
> allocation in this case. However, they prefer an annotation
> of the call site with a "0" GFP argument to show that the
> allocation flags are not relevant.
> 
> Does this:
> 
> 	xa_store(&ses->se_slots, i, xa_mk_value(slot->sl_seqid), 0);
> 
> work for you?

Sure.
And it looks like sparse will be happy even though "0" isn't explicitly
"gfp_t" because 0 is "special".

I might prefer GFP_NULL or similar, but 0 certainly works for me.  I'll
include that when I resend.

Thanks
NeilBrown