Re: resizing slot tables for sessions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 10, 2025 at 11:11 AM Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, 2025-02-10 at 13:07 -0500, Tom Talpey wrote:
> > On 2/10/2025 8:52 AM, Chuck Lever wrote:
> > > On 2/9/25 8:34 PM, Rick Macklem wrote:
> > > > On Sun, Feb 9, 2025 at 3:34 PM Trond Myklebust
> > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Sun, 2025-02-09 at 13:39 -0800, Rick Macklem wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I thought I'd post here instead of nfsv4@xxxxxxxx since I
> > > > > > think the Linux server has been implementing this recently.
> > > > > >
> > > > > > I am not interested in making the FreeBSD NFSv4.1/4.2
> > > > > > server dynamically resize slot tables in sessions, but I do
> > > > > > want to make sure the FreeBSD handles this case correctly.
> > > > > >
> > > > > > Here is what I believe is supposed to be done:
> > > > > > For growing the slot table...
> > > > > > - Server/replier sends SEQUENCE replies with both
> > > > > >     sr_highest_slot and sr_target_highest_slot set to a
> > > > > > larger value.
> > > > > > --> The client can then use those slots with
> > > > > >        sa_sequenceid set to 1 for the first SEQUENCE
> > > > > > operation on
> > > > > >        each of them.
> > > > > >
> > > > > > For shrinking the slot table...
> > > > > > - Server/replier sends SEQUENCE replies with a smaller
> > > > > >    value for sr_target_highest_slot.
> > > > > >    - The server/replier waits for the client to do a SEQUENCE
> > > > > >       operation on one of the slot(s) where the server has
> > > > > > replied
> > > > > >       with the smaller value for sr_target_highest_slot with
> > > > > > a
> > > > > >       sa_highest_slot value <= to the new smaller
> > > > > >        sr_target_highest_slot
> > > > > >       - Once this happens, the server/replier can set
> > > > > > sr_highest_slot
> > > > > >          to the lower value of sr_target_highest_slot and
> > > > > > throw the
> > > > > >           slot table entries above that value away.
> > > > > > --> Once the client sees a reply with sr_target_highest_slot
> > > > > > set
> > > > > >        to the lower value, it should not do any additional
> > > > > > SEQUENCE
> > > > > >        operations with a sa_slotid > sr_target_highest_slot
> > > > > >
> > > > > > Does the above sound correct?
> > > > >
> > > > > The above captures the case where the server is adjusting using
> > > > > OP_SEQUENCE. However there is another potential mode where the
> > > > > server
> > > > > sends out a CB_RECALL_SLOT.
> > > > Ouch. I completely forgot about this one and I'll admit the
> > > > FreeBSD client
> > > > doesn't have it implemented.
> >
> > The client is free to refuse to return slots, but the penalty may be
> > a forcible session disconnect.
> >
> > I agree you've captured the basics of the graceful-reduction
> > scenario,
> > but I do wonder if nconnect > 1 might impact the termination
> > condition.
> >
> > Because nconnect may impact the ordering of request arrival at the
> > server, it may be possible to have a timing window where one
> > connection
> > may signal a reduction while another connection's request is still
> > outstanding?
>
> Not if the client is doing it right. It doesn't really matter which
> connections were used, because the client is telling the server that "I
> have now received all the replies I'm expecting from those slots".
>
> IOW: the client is supposed to wait to update the value of
> sa_highest_slot in OP_SEQUENCE until it has actually received replies
> for all RPC requests that were sent on the slot(s) being retired.
> It shouldn't matter if there are duplicate requests or replies
> outstanding since the client is expected to ignore those (and so the
> server is indeed free to return NFS4ERR_BADSLOT if it has dropped the
> cached reply).
>
> Now there is a danger if the server starts increasing the value of
> sr_target_highest_slot before the client is done retiring slots. So
> don't do that...
Well, I think both you and Tom are correct, in a sense...
Here is what RFC8881, sec. 2.10.6.1 says:

     The replier SHOULD retain the slots it wants to retire until the
      requester sends a request with a highest_slotid less than or equal
      to the replier's new enforced highest_slotid.

I think the above is at least misleading and maybe outright incorrect.
So, if the above were considered "correctly done", I think Tom is right.

I did the original post in part to see if others agreed that the server/replier
must wait until it sees a SEQUENCE with sa_highest_slot <= the
new sr_target_highest_slot on a slot where the new sr_target_highest_slot
has been sent in a previous reply to SEQUENCE. (Without this additional
requirement of "a slot where..." I think the SEQUENCE could be in an RPC
that was generated before the client/requestor saw the new
sr_target_highest_slot.

I might post about this on nfsv4@xxxxxxxx, but I do not know if it could
be changed as an errata?

Thanks for all the comments, rick


>
> >
> > Tom.
> >
> >
> > > >
> > > > Just fyi, does the Linux server do this, or do I have some time
> > > > to implement it?
> > >
> > > As far as I can tell, Linux NFSD does not yet implement
> > > CB_RECALL_SLOT.
>
> No, but according to RFC 8881 Section 17, CB_RECALL_SLOT is labelled as
> REQuired to implement if the client ever creates a back channel. So
> other servers may expect it to be implemented.
>
> > >
> > >
> > > > > In the latter case, it is up to the client to send out enough
> > > > > SEQUENCE
> > > > > operations on the forward channel to implicitly acknowledges
> > > > > the change
> > > > > in slots using the sa_highestslot field (see RFC8881, Section
> > > > > 20.8.3).
> > > > >
> > > > > If the client was completely idle when it received the
> > > > > CB_RECALL_SLOT,
> > > > > it should only need to send out 1 extra SEQUENCE op, but if
> > > > > using RDMA,
> > > > > then it has to keep pounding out "RDMA send" messages until the
> > > > > RDMA
> > > > > credit count has been brought down too.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
>
>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux