On Tue, Feb 11, 2025 at 11:05 AM Tom Talpey <tom@xxxxxxxxxx> wrote: > > On 2/11/2025 7:26 AM, Rick Macklem wrote: > > On Mon, Feb 10, 2025 at 11:11 AM Trond Myklebust > > <trondmy@xxxxxxxxxxxxxxx> wrote: > >> > >> On Mon, 2025-02-10 at 13:07 -0500, Tom Talpey wrote: > >>> On 2/10/2025 8:52 AM, Chuck Lever wrote: > >>>> On 2/9/25 8:34 PM, Rick Macklem wrote: > >>>>> On Sun, Feb 9, 2025 at 3:34 PM Trond Myklebust > >>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: > >>>>>> > >>>>>> On Sun, 2025-02-09 at 13:39 -0800, Rick Macklem wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I thought I'd post here instead of nfsv4@xxxxxxxx since I > >>>>>>> think the Linux server has been implementing this recently. > >>>>>>> > >>>>>>> I am not interested in making the FreeBSD NFSv4.1/4.2 > >>>>>>> server dynamically resize slot tables in sessions, but I do > >>>>>>> want to make sure the FreeBSD handles this case correctly. > >>>>>>> > >>>>>>> Here is what I believe is supposed to be done: > >>>>>>> For growing the slot table... > >>>>>>> - Server/replier sends SEQUENCE replies with both > >>>>>>> sr_highest_slot and sr_target_highest_slot set to a > >>>>>>> larger value. > >>>>>>> --> The client can then use those slots with > >>>>>>> sa_sequenceid set to 1 for the first SEQUENCE > >>>>>>> operation on > >>>>>>> each of them. > >>>>>>> > >>>>>>> For shrinking the slot table... > >>>>>>> - Server/replier sends SEQUENCE replies with a smaller > >>>>>>> value for sr_target_highest_slot. > >>>>>>> - The server/replier waits for the client to do a SEQUENCE > >>>>>>> operation on one of the slot(s) where the server has > >>>>>>> replied > >>>>>>> with the smaller value for sr_target_highest_slot with > >>>>>>> a > >>>>>>> sa_highest_slot value <= to the new smaller > >>>>>>> sr_target_highest_slot > >>>>>>> - Once this happens, the server/replier can set > >>>>>>> sr_highest_slot > >>>>>>> to the lower value of sr_target_highest_slot and > >>>>>>> throw the > >>>>>>> slot table entries above that value away. > >>>>>>> --> Once the client sees a reply with sr_target_highest_slot > >>>>>>> set > >>>>>>> to the lower value, it should not do any additional > >>>>>>> SEQUENCE > >>>>>>> operations with a sa_slotid > sr_target_highest_slot > >>>>>>> > >>>>>>> Does the above sound correct? > >>>>>> > >>>>>> The above captures the case where the server is adjusting using > >>>>>> OP_SEQUENCE. However there is another potential mode where the > >>>>>> server > >>>>>> sends out a CB_RECALL_SLOT. > >>>>> Ouch. I completely forgot about this one and I'll admit the > >>>>> FreeBSD client > >>>>> doesn't have it implemented. Btw, I just coded this for the FreeBSD client and used a fake server to test it. I found that wireshark doesn't know how to decode the argument for CB_RECALL_SLOT, which is another hint that it is not being used. (It will take a while to get into releases.) I, personally, think CB_RECALL_SLOT is pretty useless, since it can only be used for sessions with backchannels (no sessionid argument). > >>> > >>> The client is free to refuse to return slots, but the penalty may be > >>> a forcible session disconnect. > >>> > >>> I agree you've captured the basics of the graceful-reduction > >>> scenario, > >>> but I do wonder if nconnect > 1 might impact the termination > >>> condition. > >>> > >>> Because nconnect may impact the ordering of request arrival at the > >>> server, it may be possible to have a timing window where one > >>> connection > >>> may signal a reduction while another connection's request is still > >>> outstanding? > >> > >> Not if the client is doing it right. It doesn't really matter which > >> connections were used, because the client is telling the server that "I > >> have now received all the replies I'm expecting from those slots". > >> > >> IOW: the client is supposed to wait to update the value of > >> sa_highest_slot in OP_SEQUENCE until it has actually received replies > >> for all RPC requests that were sent on the slot(s) being retired. > >> It shouldn't matter if there are duplicate requests or replies > >> outstanding since the client is expected to ignore those (and so the > >> server is indeed free to return NFS4ERR_BADSLOT if it has dropped the > >> cached reply). > >> > >> Now there is a danger if the server starts increasing the value of > >> sr_target_highest_slot before the client is done retiring slots. So > >> don't do that... > > Well, I think both you and Tom are correct, in a sense... > > Here is what RFC8881, sec. 2.10.6.1 says: > > > > The replier SHOULD retain the slots it wants to retire until the > > requester sends a request with a highest_slotid less than or equal > > to the replier's new enforced highest_slotid. > > > > I think the above is at least misleading and maybe outright incorrect. > > So, if the above were considered "correctly done", I think Tom is right. > > I think both Trond and I are right. :) In any event we're not disagreeing, > it's just thaty the client implementation needs to be careful. > If there are multiple forechannels, they all need to be taken > into consideration. The server doesn't have any protocol-specific > guarantee that the client has done so. Therefore it's on the client. All the client needs to do is not use the slots above the new target_highest. To me, it is the server that needs to be careful to not throw away the slots above target_highest before any RPCs issued by the client before the target_highest was lowered might still be in flight. At least that is my current understanding of it, rick > > > I did the original post in part to see if others agreed that the server/replier > > must wait until it sees a SEQUENCE with sa_highest_slot <= the > > new sr_target_highest_slot on a slot where the new sr_target_highest_slot > > has been sent in a previous reply to SEQUENCE. (Without this additional > > requirement of "a slot where..." I think the SEQUENCE could be in an RPC > > that was generated before the client/requestor saw the new > > sr_target_highest_slot. > > > > I might post about this on nfsv4@xxxxxxxx, but I do not know if it could > > be changed as an errata? > > I'm not sure it's wrong, but it could perhaps be clarified if there is > an ambiguity that leads to a flawed implementation. Adding informative > text can be a slippery slope however, it can lead to new ambiguities. > Either way, it's an IETF matter. > > Tom. > > > > > Thanks for all the comments, rick > > > > > >> > >>> > >>> Tom. > >>> > >>> > >>>>> > >>>>> Just fyi, does the Linux server do this, or do I have some time > >>>>> to implement it? > >>>> > >>>> As far as I can tell, Linux NFSD does not yet implement > >>>> CB_RECALL_SLOT. > >> > >> No, but according to RFC 8881 Section 17, CB_RECALL_SLOT is labelled as > >> REQuired to implement if the client ever creates a back channel. So > >> other servers may expect it to be implemented. > >> > >>>> > >>>> > >>>>>> In the latter case, it is up to the client to send out enough > >>>>>> SEQUENCE > >>>>>> operations on the forward channel to implicitly acknowledges > >>>>>> the change > >>>>>> in slots using the sa_highestslot field (see RFC8881, Section > >>>>>> 20.8.3). > >>>>>> > >>>>>> If the client was completely idle when it received the > >>>>>> CB_RECALL_SLOT, > >>>>>> it should only need to send out 1 extra SEQUENCE op, but if > >>>>>> using RDMA, > >>>>>> then it has to keep pounding out "RDMA send" messages until the > >>>>>> RDMA > >>>>>> credit count has been brought down too. > >> > >> -- > >> Trond Myklebust > >> Linux NFS client maintainer, Hammerspace > >> trond.myklebust@xxxxxxxxxxxxxxx > >> > >> > > >