Re: [PATCH 5/6] nfsd: add support for freeing unused session-DRC slots

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Tue, 3 Dec 2024 14:40:16 +0000

> On Dec 2, 2024, at 11:28 PM, NeilBrown <neilb@xxxxxxx> wrote:
> 
> On Tue, 03 Dec 2024, Chuck Lever III wrote:
>> 
>> 
>>> On Nov 21, 2024, at 5:29 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>> 
>>> 
>>>> On Nov 21, 2024, at 4:47 PM, NeilBrown <neilb@xxxxxxx> wrote:
>>>> 
>>>> On Wed, 20 Nov 2024, Chuck Lever wrote:
>>>>> On Wed, Nov 20, 2024 at 09:35:00AM +1100, NeilBrown wrote:
>>>>>> On Wed, 20 Nov 2024, Chuck Lever wrote:
>>>>>>> On Tue, Nov 19, 2024 at 11:41:32AM +1100, NeilBrown wrote:
>>>>>>>> Reducing the number of slots in the session slot table requires
>>>>>>>> confirmation from the client.  This patch adds reduce_session_slots()
>>>>>>>> which starts the process of getting confirmation, but never calls it.
>>>>>>>> That will come in a later patch.
>>>>>>>> 
>>>>>>>> Before we can free a slot we need to confirm that the client won't try
>>>>>>>> to use it again.  This involves returning a lower cr_maxrequests in a
>>>>>>>> SEQUENCE reply and then seeing a ca_maxrequests on the same slot which
>>>>>>>> is not larger than we limit we are trying to impose.  So for each slot
>>>>>>>> we need to remember that we have sent a reduced cr_maxrequests.
>>>>>>>> 
>>>>>>>> To achieve this we introduce a concept of request "generations".  Each
>>>>>>>> time we decide to reduce cr_maxrequests we increment the generation
>>>>>>>> number, and record this when we return the lower cr_maxrequests to the
>>>>>>>> client.  When a slot with the current generation reports a low
>>>>>>>> ca_maxrequests, we commit to that level and free extra slots.
>>>>>>>> 
>>>>>>>> We use an 8 bit generation number (64 seems wasteful) and if it cycles
>>>>>>>> we iterate all slots and reset the generation number to avoid false matches.
>>>>>>>> 
>>>>>>>> When we free a slot we store the seqid in the slot pointer so that it can
>>>>>>>> be restored when we reactivate the slot.  The RFC can be read as
>>>>>>>> suggesting that the slot number could restart from one after a slot is
>>>>>>>> retired and reactivated, but also suggests that retiring slots is not
>>>>>>>> required.  So when we reactive a slot we accept with the next seqid in
>>>>>>>> sequence, or 1.
>>>>>>>> 
>>>>>>>> When decoding sa_highest_slotid into maxslots we need to add 1 - this
>>>>>>>> matches how it is encoded for the reply.
>>>>>>>> 
>>>>>>>> Signed-off-by: NeilBrown <neilb@xxxxxxx>
>>>>>>>> ---
>>>>>>>> fs/nfsd/nfs4state.c | 81 ++++++++++++++++++++++++++++++++++++++-------
>>>>>>>> fs/nfsd/nfs4xdr.c   |  5 +--
>>>>>>>> fs/nfsd/state.h     |  4 +++
>>>>>>>> fs/nfsd/xdr4.h      |  2 --
>>>>>>>> 4 files changed, 76 insertions(+), 16 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>>>>>> index fb522165b376..0625b0aec6b8 100644
>>>>>>>> --- a/fs/nfsd/nfs4state.c
>>>>>>>> +++ b/fs/nfsd/nfs4state.c
>>>>>>>> @@ -1910,17 +1910,55 @@ gen_sessionid(struct nfsd4_session *ses)
>>>>>>>> #define NFSD_MIN_HDR_SEQ_SZ  (24 + 12 + 44)
>>>>>>>> 
>>>>>>>> static void
>>>>>>>> -free_session_slots(struct nfsd4_session *ses)
>>>>>>>> +free_session_slots(struct nfsd4_session *ses, int from)
>>>>>>>> {
>>>>>>>> int i;
>>>>>>>> 
>>>>>>>> - for (i = 0; i < ses->se_fchannel.maxreqs; i++) {
>>>>>>>> + if (from >= ses->se_fchannel.maxreqs)
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + for (i = from; i < ses->se_fchannel.maxreqs; i++) {
>>>>>>>> struct nfsd4_slot *slot = xa_load(&ses->se_slots, i);
>>>>>>>> 
>>>>>>>> - xa_erase(&ses->se_slots, i);
>>>>>>>> + /*
>>>>>>>> +  * Save the seqid in case we reactivate this slot.
>>>>>>>> +  * This will never require a memory allocation so GFP
>>>>>>>> +  * flag is irrelevant
>>>>>>>> +  */
>>>>>>>> + xa_store(&ses->se_slots, i, xa_mk_value(slot->sl_seqid),
>>>>>>>> +  GFP_ATOMIC);
>>>>>>> 
>>>>>>> Again... ATOMIC is probably not what we want here, even if it is
>>>>>>> only documentary.
>>>>>> 
>>>>>> Why not?  It might be called under a spinlock so GFP_KERNEL might trigger
>>>>>> a warning.
>>>>> 
>>>>> I find using GFP_ATOMIC here to be confusing -- it requests
>>>>> allocation from special memory reserves and is to be used in
>>>>> situations where allocation might result in system failure. That is
>>>>> clearly not the case here, and the resulting memory allocation might
>>>>> be long-lived.
>>>> 
>>>> Would you be comfortable with GFP_NOWAIT which leaves out __GFP_HIGH ??
>>> 
>>> I will be comfortable when I hear back from Matthew and Liam.
>>> 
>>> :-)
>>> 
>>> 
>>>>> I see the comment that says memory won't actually be allocated. I'm
>>>>> not sure that's the way xa_store() works, however.
>>>> 
>>>> xarray.rst says:
>>>> 
>>>> The xa_store(), xa_cmpxchg(), xa_alloc(),
>>>> xa_reserve() and xa_insert() functions take a gfp_t
>>>> parameter in case the XArray needs to allocate memory to store this entry.
>>>> If the entry is being deleted, no memory allocation needs to be performed,
>>>> and the GFP flags specified will be ignored.`
>>>> 
>>>> The particular context is that a normal pointer is currently stored a
>>>> the given index, and we are replacing that with a number.  The above
>>>> doesn't explicitly say that won't require a memory allocation, but I
>>>> think it is reasonable to say it won't need "to allocate memory to store
>>>> this entry" - as an entry is already stored - so allocation should not
>>>> be needed.
>>> 
>>> xa_mk_value() converts a number to a pointer, and xa_store
>>> stores that pointer.
>>> 
>>> I suspect that xa_store() is allowed to rebalance the
>>> xarray's internal data structures, and that could result
>>> in memory release or allocation. That's why a GFP flag is
>>> one of the arguments.
>> 
>> Matthew says the xa_store() is guaranteed not to do a memory
>> allocation in this case. However, they prefer an annotation
>> of the call site with a "0" GFP argument to show that the
>> allocation flags are not relevant.
>> 
>> Does this:
>> 
>> xa_store(&ses->se_slots, i, xa_mk_value(slot->sl_seqid), 0);
>> 
>> work for you?
> 
> Sure.
> And it looks like sparse will be happy even though "0" isn't explicitly
> "gfp_t" because 0 is "special".
> 
> I might prefer GFP_NULL or similar, but 0 certainly works for me.  I'll
> include that when I resend.

Matthew suggested GFP_NOALLOC. But neither of these symbolic
flags exist yet. I'd rather not hold up this series behind
the bikeshedding of the flag name ;-)

--
Chuck Lever