Re: resizing slot tables for sessions

Rick Macklem <rick.macklem@xxxxxxxxx> · Thu, 13 Feb 2025 18:39:32 -0800



On Thu, Feb 13, 2025 at 5:59 PM Tom Talpey <tom@xxxxxxxxxx> wrote:
>
> On 2/13/2025 7:55 PM, Rick Macklem wrote:
> > On Tue, Feb 11, 2025 at 11:05 AM Tom Talpey <tom@xxxxxxxxxx> wrote:
> >>
> >> On 2/11/2025 7:26 AM, Rick Macklem wrote:
> >>> On Mon, Feb 10, 2025 at 11:11 AM Trond Myklebust
> >>> <trondmy@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> On Mon, 2025-02-10 at 13:07 -0500, Tom Talpey wrote:
> >>>>> On 2/10/2025 8:52 AM, Chuck Lever wrote:
> >>>>>> On 2/9/25 8:34 PM, Rick Macklem wrote:
> >>>>>>> On Sun, Feb 9, 2025 at 3:34 PM Trond Myklebust
> >>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
> >>>>>>>>
> >>>>>>>> On Sun, 2025-02-09 at 13:39 -0800, Rick Macklem wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I thought I'd post here instead of nfsv4@xxxxxxxx since I
> >>>>>>>>> think the Linux server has been implementing this recently.
> >>>>>>>>>
> >>>>>>>>> I am not interested in making the FreeBSD NFSv4.1/4.2
> >>>>>>>>> server dynamically resize slot tables in sessions, but I do
> >>>>>>>>> want to make sure the FreeBSD handles this case correctly.
> >>>>>>>>>
> >>>>>>>>> Here is what I believe is supposed to be done:
> >>>>>>>>> For growing the slot table...
> >>>>>>>>> - Server/replier sends SEQUENCE replies with both
> >>>>>>>>>       sr_highest_slot and sr_target_highest_slot set to a
> >>>>>>>>> larger value.
> >>>>>>>>> --> The client can then use those slots with
> >>>>>>>>>          sa_sequenceid set to 1 for the first SEQUENCE
> >>>>>>>>> operation on
> >>>>>>>>>          each of them.
> >>>>>>>>>
> >>>>>>>>> For shrinking the slot table...
> >>>>>>>>> - Server/replier sends SEQUENCE replies with a smaller
> >>>>>>>>>      value for sr_target_highest_slot.
> >>>>>>>>>      - The server/replier waits for the client to do a SEQUENCE
> >>>>>>>>>         operation on one of the slot(s) where the server has
> >>>>>>>>> replied
> >>>>>>>>>         with the smaller value for sr_target_highest_slot with
> >>>>>>>>> a
> >>>>>>>>>         sa_highest_slot value <= to the new smaller
> >>>>>>>>>          sr_target_highest_slot
> >>>>>>>>>         - Once this happens, the server/replier can set
> >>>>>>>>> sr_highest_slot
> >>>>>>>>>            to the lower value of sr_target_highest_slot and
> >>>>>>>>> throw the
> >>>>>>>>>             slot table entries above that value away.
> >>>>>>>>> --> Once the client sees a reply with sr_target_highest_slot
> >>>>>>>>> set
> >>>>>>>>>          to the lower value, it should not do any additional
> >>>>>>>>> SEQUENCE
> >>>>>>>>>          operations with a sa_slotid > sr_target_highest_slot
> >>>>>>>>>
> >>>>>>>>> Does the above sound correct?
> >>>>>>>>
> >>>>>>>> The above captures the case where the server is adjusting using
> >>>>>>>> OP_SEQUENCE. However there is another potential mode where the
> >>>>>>>> server
> >>>>>>>> sends out a CB_RECALL_SLOT.
> >>>>>>> Ouch. I completely forgot about this one and I'll admit the
> >>>>>>> FreeBSD client
> >>>>>>> doesn't have it implemented.
> > Btw, I just coded this for the FreeBSD client and used a fake server
> > to test it. I found that wireshark doesn't know how to decode the
> > argument for CB_RECALL_SLOT, which is another hint that it is
> > not being used. (It will take a while to get into releases.)
> >
> > I, personally, think CB_RECALL_SLOT is pretty useless, since it can only be used
> > for sessions with backchannels (no sessionid argument).
>
> The primary reason for it is for managing RDMA credit resources used by
> idle clients. If the client is sending no traffic, there are no
> opportunities for the server to send back target slot changes. Since
> RDMA credits consume significant memory and RDMA NIC-based resources,
> releasing these, or sharing them more usefully without just closing
> connections, becomes a big win.
I'll take your word on this. I know nothing about RDMA and FreeBSD's NFS
doesn't support RDMA channels.

>
>
> >>>>>
> >>>>> The client is free to refuse to return slots, but the penalty may be
> >>>>> a forcible session disconnect.
> >>>>>
> >>>>> I agree you've captured the basics of the graceful-reduction
> >>>>> scenario,
> >>>>> but I do wonder if nconnect > 1 might impact the termination
> >>>>> condition.
> >>>>>
> >>>>> Because nconnect may impact the ordering of request arrival at the
> >>>>> server, it may be possible to have a timing window where one
> >>>>> connection
> >>>>> may signal a reduction while another connection's request is still
> >>>>> outstanding?
> >>>>
> >>>> Not if the client is doing it right. It doesn't really matter which
> >>>> connections were used, because the client is telling the server that "I
> >>>> have now received all the replies I'm expecting from those slots".
> >>>>
> >>>> IOW: the client is supposed to wait to update the value of
> >>>> sa_highest_slot in OP_SEQUENCE until it has actually received replies
> >>>> for all RPC requests that were sent on the slot(s) being retired.
> >>>> It shouldn't matter if there are duplicate requests or replies
> >>>> outstanding since the client is expected to ignore those (and so the
> >>>> server is indeed free to return NFS4ERR_BADSLOT if it has dropped the
> >>>> cached reply).
> >>>>
> >>>> Now there is a danger if the server starts increasing the value of
> >>>> sr_target_highest_slot before the client is done retiring slots. So
> >>>> don't do that...
> >>> Well, I think both you and Tom are correct, in a sense...
> >>> Here is what RFC8881, sec. 2.10.6.1 says:
> >>>
> >>>        The replier SHOULD retain the slots it wants to retire until the
> >>>         requester sends a request with a highest_slotid less than or equal
> >>>         to the replier's new enforced highest_slotid.
> >>>
> >>> I think the above is at least misleading and maybe outright incorrect.
> >>> So, if the above were considered "correctly done", I think Tom is right.
> >>
> >> I think both Trond and I are right. :) In any event we're not disagreeing,
> >> it's just thaty the client implementation needs to be careful.
> >> If there are multiple forechannels, they all need to be taken
> >> into consideration. The server doesn't have any protocol-specific
> >> guarantee that the client has done so. Therefore it's on the client.
> > All the client needs to do is not use the slots above the new target_highest.
>
> Pretty much, yes.
>
> > To me, it is the server that needs to be careful to not throw away the slots
> > above target_highest before any RPCs issued by the client before the
> > target_highest was lowered might still be in flight.
>
> Also correct. The slots can be retired (and freed by the server) when
> the client reduces its slot highwater.
True, but the server does have to be careful w.r.t. temporal ordering
(or it must ensure that the SEQUENCE request with a small enough
slot highwater was generated on the client after the client processed
the target highwater
in a SEQUENCE reply).

For the SEQUENCE case:
- It must see the small enough high water on a slot where the server
   had sent the new target highwater in a prior reply.
For the CB_RECALL_SLOT case:
- It must see the small enough high water on a slot where the server
   had sent a reply on the slot after receiving the NFS_OK reply to the
   CB_RECALL_SLOT.

rick

>
> Tom.
> >
> > At least that is my current understanding of it, rick
> >
> >>
> >>> I did the original post in part to see if others agreed that the server/replier
> >>> must wait until it sees a SEQUENCE with sa_highest_slot <= the
> >>> new sr_target_highest_slot on a slot where the new sr_target_highest_slot
> >>> has been sent in a previous reply to SEQUENCE. (Without this additional
> >>> requirement of "a slot where..." I think the SEQUENCE could be in an RPC
> >>> that was generated before the client/requestor saw the new
> >>> sr_target_highest_slot.
> >>>
> >>> I might post about this on nfsv4@xxxxxxxx, but I do not know if it could
> >>> be changed as an errata?
> >>
> >> I'm not sure it's wrong, but it could perhaps be clarified if there is
> >> an ambiguity that leads to a flawed implementation. Adding informative
> >> text can be a slippery slope however, it can lead to new ambiguities.
> >> Either way, it's an IETF matter.
> >>
> >> Tom.
> >>
> >>>
> >>> Thanks for all the comments, rick
> >>>
> >>>
> >>>>
> >>>>>
> >>>>> Tom.
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>> Just fyi, does the Linux server do this, or do I have some time
> >>>>>>> to implement it?
> >>>>>>
> >>>>>> As far as I can tell, Linux NFSD does not yet implement
> >>>>>> CB_RECALL_SLOT.
> >>>>
> >>>> No, but according to RFC 8881 Section 17, CB_RECALL_SLOT is labelled as
> >>>> REQuired to implement if the client ever creates a back channel. So
> >>>> other servers may expect it to be implemented.
> >>>>
> >>>>>>
> >>>>>>
> >>>>>>>> In the latter case, it is up to the client to send out enough
> >>>>>>>> SEQUENCE
> >>>>>>>> operations on the forward channel to implicitly acknowledges
> >>>>>>>> the change
> >>>>>>>> in slots using the sa_highestslot field (see RFC8881, Section
> >>>>>>>> 20.8.3).
> >>>>>>>>
> >>>>>>>> If the client was completely idle when it received the
> >>>>>>>> CB_RECALL_SLOT,
> >>>>>>>> it should only need to send out 1 extra SEQUENCE op, but if
> >>>>>>>> using RDMA,
> >>>>>>>> then it has to keep pounding out "RDMA send" messages until the
> >>>>>>>> RDMA
> >>>>>>>> credit count has been brought down too.
> >>>>
> >>>> --
> >>>> Trond Myklebust
> >>>> Linux NFS client maintainer, Hammerspace
> >>>> trond.myklebust@xxxxxxxxxxxxxxx
> >>>>
> >>>>
> >>>
> >>
> >
>