On Mon, Feb 10, 2025 at 11:11 AM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Mon, 2025-02-10 at 13:07 -0500, Tom Talpey wrote: > > On 2/10/2025 8:52 AM, Chuck Lever wrote: > > > On 2/9/25 8:34 PM, Rick Macklem wrote: > > > > On Sun, Feb 9, 2025 at 3:34 PM Trond Myklebust > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > On Sun, 2025-02-09 at 13:39 -0800, Rick Macklem wrote: > > > > > > Hi, > > > > > > > > > > > > I thought I'd post here instead of nfsv4@xxxxxxxx since I > > > > > > think the Linux server has been implementing this recently. > > > > > > > > > > > > I am not interested in making the FreeBSD NFSv4.1/4.2 > > > > > > server dynamically resize slot tables in sessions, but I do > > > > > > want to make sure the FreeBSD handles this case correctly. > > > > > > > > > > > > Here is what I believe is supposed to be done: > > > > > > For growing the slot table... > > > > > > - Server/replier sends SEQUENCE replies with both > > > > > > sr_highest_slot and sr_target_highest_slot set to a > > > > > > larger value. > > > > > > --> The client can then use those slots with > > > > > > sa_sequenceid set to 1 for the first SEQUENCE > > > > > > operation on > > > > > > each of them. > > > > > > > > > > > > For shrinking the slot table... > > > > > > - Server/replier sends SEQUENCE replies with a smaller > > > > > > value for sr_target_highest_slot. > > > > > > - The server/replier waits for the client to do a SEQUENCE > > > > > > operation on one of the slot(s) where the server has > > > > > > replied > > > > > > with the smaller value for sr_target_highest_slot with > > > > > > a > > > > > > sa_highest_slot value <= to the new smaller > > > > > > sr_target_highest_slot > > > > > > - Once this happens, the server/replier can set > > > > > > sr_highest_slot > > > > > > to the lower value of sr_target_highest_slot and > > > > > > throw the > > > > > > slot table entries above that value away. > > > > > > --> Once the client sees a reply with sr_target_highest_slot > > > > > > set > > > > > > to the lower value, it should not do any additional > > > > > > SEQUENCE > > > > > > operations with a sa_slotid > sr_target_highest_slot > > > > > > > > > > > > Does the above sound correct? > > > > > > > > > > The above captures the case where the server is adjusting using > > > > > OP_SEQUENCE. However there is another potential mode where the > > > > > server > > > > > sends out a CB_RECALL_SLOT. > > > > Ouch. I completely forgot about this one and I'll admit the > > > > FreeBSD client > > > > doesn't have it implemented. > > > > The client is free to refuse to return slots, but the penalty may be > > a forcible session disconnect. > > > > I agree you've captured the basics of the graceful-reduction > > scenario, > > but I do wonder if nconnect > 1 might impact the termination > > condition. > > > > Because nconnect may impact the ordering of request arrival at the > > server, it may be possible to have a timing window where one > > connection > > may signal a reduction while another connection's request is still > > outstanding? > > Not if the client is doing it right. It doesn't really matter which > connections were used, because the client is telling the server that "I > have now received all the replies I'm expecting from those slots". > > IOW: the client is supposed to wait to update the value of > sa_highest_slot in OP_SEQUENCE until it has actually received replies > for all RPC requests that were sent on the slot(s) being retired. > It shouldn't matter if there are duplicate requests or replies > outstanding since the client is expected to ignore those (and so the > server is indeed free to return NFS4ERR_BADSLOT if it has dropped the > cached reply). > > Now there is a danger if the server starts increasing the value of > sr_target_highest_slot before the client is done retiring slots. So > don't do that... Well, I think both you and Tom are correct, in a sense... Here is what RFC8881, sec. 2.10.6.1 says: The replier SHOULD retain the slots it wants to retire until the requester sends a request with a highest_slotid less than or equal to the replier's new enforced highest_slotid. I think the above is at least misleading and maybe outright incorrect. So, if the above were considered "correctly done", I think Tom is right. I did the original post in part to see if others agreed that the server/replier must wait until it sees a SEQUENCE with sa_highest_slot <= the new sr_target_highest_slot on a slot where the new sr_target_highest_slot has been sent in a previous reply to SEQUENCE. (Without this additional requirement of "a slot where..." I think the SEQUENCE could be in an RPC that was generated before the client/requestor saw the new sr_target_highest_slot. I might post about this on nfsv4@xxxxxxxx, but I do not know if it could be changed as an errata? Thanks for all the comments, rick > > > > > Tom. > > > > > > > > > > > > Just fyi, does the Linux server do this, or do I have some time > > > > to implement it? > > > > > > As far as I can tell, Linux NFSD does not yet implement > > > CB_RECALL_SLOT. > > No, but according to RFC 8881 Section 17, CB_RECALL_SLOT is labelled as > REQuired to implement if the client ever creates a back channel. So > other servers may expect it to be implemented. > > > > > > > > > > > > In the latter case, it is up to the client to send out enough > > > > > SEQUENCE > > > > > operations on the forward channel to implicitly acknowledges > > > > > the change > > > > > in slots using the sa_highestslot field (see RFC8881, Section > > > > > 20.8.3). > > > > > > > > > > If the client was completely idle when it received the > > > > > CB_RECALL_SLOT, > > > > > it should only need to send out 1 extra SEQUENCE op, but if > > > > > using RDMA, > > > > > then it has to keep pounding out "RDMA send" messages until the > > > > > RDMA > > > > > credit count has been brought down too. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >