Re: resizing slot tables for sessions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 11, 2025 at 11:05 AM Tom Talpey <tom@xxxxxxxxxx> wrote:
>
> On 2/11/2025 7:26 AM, Rick Macklem wrote:
> > On Mon, Feb 10, 2025 at 11:11 AM Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> >>
> >> On Mon, 2025-02-10 at 13:07 -0500, Tom Talpey wrote:
> >>> On 2/10/2025 8:52 AM, Chuck Lever wrote:
> >>>> On 2/9/25 8:34 PM, Rick Macklem wrote:
> >>>>> On Sun, Feb 9, 2025 at 3:34 PM Trond Myklebust
> >>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> On Sun, 2025-02-09 at 13:39 -0800, Rick Macklem wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I thought I'd post here instead of nfsv4@xxxxxxxx since I
> >>>>>>> think the Linux server has been implementing this recently.
> >>>>>>>
> >>>>>>> I am not interested in making the FreeBSD NFSv4.1/4.2
> >>>>>>> server dynamically resize slot tables in sessions, but I do
> >>>>>>> want to make sure the FreeBSD handles this case correctly.
> >>>>>>>
> >>>>>>> Here is what I believe is supposed to be done:
> >>>>>>> For growing the slot table...
> >>>>>>> - Server/replier sends SEQUENCE replies with both
> >>>>>>>      sr_highest_slot and sr_target_highest_slot set to a
> >>>>>>> larger value.
> >>>>>>> --> The client can then use those slots with
> >>>>>>>         sa_sequenceid set to 1 for the first SEQUENCE
> >>>>>>> operation on
> >>>>>>>         each of them.
> >>>>>>>
> >>>>>>> For shrinking the slot table...
> >>>>>>> - Server/replier sends SEQUENCE replies with a smaller
> >>>>>>>     value for sr_target_highest_slot.
> >>>>>>>     - The server/replier waits for the client to do a SEQUENCE
> >>>>>>>        operation on one of the slot(s) where the server has
> >>>>>>> replied
> >>>>>>>        with the smaller value for sr_target_highest_slot with
> >>>>>>> a
> >>>>>>>        sa_highest_slot value <= to the new smaller
> >>>>>>>         sr_target_highest_slot
> >>>>>>>        - Once this happens, the server/replier can set
> >>>>>>> sr_highest_slot
> >>>>>>>           to the lower value of sr_target_highest_slot and
> >>>>>>> throw the
> >>>>>>>            slot table entries above that value away.
> >>>>>>> --> Once the client sees a reply with sr_target_highest_slot
> >>>>>>> set
> >>>>>>>         to the lower value, it should not do any additional
> >>>>>>> SEQUENCE
> >>>>>>>         operations with a sa_slotid > sr_target_highest_slot
> >>>>>>>
> >>>>>>> Does the above sound correct?
> >>>>>>
> >>>>>> The above captures the case where the server is adjusting using
> >>>>>> OP_SEQUENCE. However there is another potential mode where the
> >>>>>> server
> >>>>>> sends out a CB_RECALL_SLOT.
> >>>>> Ouch. I completely forgot about this one and I'll admit the
> >>>>> FreeBSD client
> >>>>> doesn't have it implemented.
Btw, I just coded this for the FreeBSD client and used a fake server
to test it. I found that wireshark doesn't know how to decode the
argument for CB_RECALL_SLOT, which is another hint that it is
not being used. (It will take a while to get into releases.)

I, personally, think CB_RECALL_SLOT is pretty useless, since it can only be used
for sessions with backchannels (no sessionid argument).

> >>>
> >>> The client is free to refuse to return slots, but the penalty may be
> >>> a forcible session disconnect.
> >>>
> >>> I agree you've captured the basics of the graceful-reduction
> >>> scenario,
> >>> but I do wonder if nconnect > 1 might impact the termination
> >>> condition.
> >>>
> >>> Because nconnect may impact the ordering of request arrival at the
> >>> server, it may be possible to have a timing window where one
> >>> connection
> >>> may signal a reduction while another connection's request is still
> >>> outstanding?
> >>
> >> Not if the client is doing it right. It doesn't really matter which
> >> connections were used, because the client is telling the server that "I
> >> have now received all the replies I'm expecting from those slots".
> >>
> >> IOW: the client is supposed to wait to update the value of
> >> sa_highest_slot in OP_SEQUENCE until it has actually received replies
> >> for all RPC requests that were sent on the slot(s) being retired.
> >> It shouldn't matter if there are duplicate requests or replies
> >> outstanding since the client is expected to ignore those (and so the
> >> server is indeed free to return NFS4ERR_BADSLOT if it has dropped the
> >> cached reply).
> >>
> >> Now there is a danger if the server starts increasing the value of
> >> sr_target_highest_slot before the client is done retiring slots. So
> >> don't do that...
> > Well, I think both you and Tom are correct, in a sense...
> > Here is what RFC8881, sec. 2.10.6.1 says:
> >
> >       The replier SHOULD retain the slots it wants to retire until the
> >        requester sends a request with a highest_slotid less than or equal
> >        to the replier's new enforced highest_slotid.
> >
> > I think the above is at least misleading and maybe outright incorrect.
> > So, if the above were considered "correctly done", I think Tom is right.
>
> I think both Trond and I are right. :) In any event we're not disagreeing,
> it's just thaty the client implementation needs to be careful.
> If there are multiple forechannels, they all need to be taken
> into consideration. The server doesn't have any protocol-specific
> guarantee that the client has done so. Therefore it's on the client.
All the client needs to do is not use the slots above the new target_highest.
To me, it is the server that needs to be careful to not throw away the slots
above target_highest before any RPCs issued by the client before the
target_highest was lowered might still be in flight.

At least that is my current understanding of it, rick

>
> > I did the original post in part to see if others agreed that the server/replier
> > must wait until it sees a SEQUENCE with sa_highest_slot <= the
> > new sr_target_highest_slot on a slot where the new sr_target_highest_slot
> > has been sent in a previous reply to SEQUENCE. (Without this additional
> > requirement of "a slot where..." I think the SEQUENCE could be in an RPC
> > that was generated before the client/requestor saw the new
> > sr_target_highest_slot.
> >
> > I might post about this on nfsv4@xxxxxxxx, but I do not know if it could
> > be changed as an errata?
>
> I'm not sure it's wrong, but it could perhaps be clarified if there is
> an ambiguity that leads to a flawed implementation. Adding informative
> text can be a slippery slope however, it can lead to new ambiguities.
> Either way, it's an IETF matter.
>
> Tom.
>
> >
> > Thanks for all the comments, rick
> >
> >
> >>
> >>>
> >>> Tom.
> >>>
> >>>
> >>>>>
> >>>>> Just fyi, does the Linux server do this, or do I have some time
> >>>>> to implement it?
> >>>>
> >>>> As far as I can tell, Linux NFSD does not yet implement
> >>>> CB_RECALL_SLOT.
> >>
> >> No, but according to RFC 8881 Section 17, CB_RECALL_SLOT is labelled as
> >> REQuired to implement if the client ever creates a back channel. So
> >> other servers may expect it to be implemented.
> >>
> >>>>
> >>>>
> >>>>>> In the latter case, it is up to the client to send out enough
> >>>>>> SEQUENCE
> >>>>>> operations on the forward channel to implicitly acknowledges
> >>>>>> the change
> >>>>>> in slots using the sa_highestslot field (see RFC8881, Section
> >>>>>> 20.8.3).
> >>>>>>
> >>>>>> If the client was completely idle when it received the
> >>>>>> CB_RECALL_SLOT,
> >>>>>> it should only need to send out 1 extra SEQUENCE op, but if
> >>>>>> using RDMA,
> >>>>>> then it has to keep pounding out "RDMA send" messages until the
> >>>>>> RDMA
> >>>>>> credit count has been brought down too.
> >>
> >> --
> >> Trond Myklebust
> >> Linux NFS client maintainer, Hammerspace
> >> trond.myklebust@xxxxxxxxxxxxxxx
> >>
> >>
> >
>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux