On Thu, Feb 13, 2025 at 5:59 PM Tom Talpey <tom@xxxxxxxxxx> wrote: > > On 2/13/2025 7:55 PM, Rick Macklem wrote: > > On Tue, Feb 11, 2025 at 11:05 AM Tom Talpey <tom@xxxxxxxxxx> wrote: > >> > >> On 2/11/2025 7:26 AM, Rick Macklem wrote: > >>> On Mon, Feb 10, 2025 at 11:11 AM Trond Myklebust > >>> <trondmy@xxxxxxxxxxxxxxx> wrote: > >>>> > >>>> On Mon, 2025-02-10 at 13:07 -0500, Tom Talpey wrote: > >>>>> On 2/10/2025 8:52 AM, Chuck Lever wrote: > >>>>>> On 2/9/25 8:34 PM, Rick Macklem wrote: > >>>>>>> On Sun, Feb 9, 2025 at 3:34 PM Trond Myklebust > >>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>> On Sun, 2025-02-09 at 13:39 -0800, Rick Macklem wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I thought I'd post here instead of nfsv4@xxxxxxxx since I > >>>>>>>>> think the Linux server has been implementing this recently. > >>>>>>>>> > >>>>>>>>> I am not interested in making the FreeBSD NFSv4.1/4.2 > >>>>>>>>> server dynamically resize slot tables in sessions, but I do > >>>>>>>>> want to make sure the FreeBSD handles this case correctly. > >>>>>>>>> > >>>>>>>>> Here is what I believe is supposed to be done: > >>>>>>>>> For growing the slot table... > >>>>>>>>> - Server/replier sends SEQUENCE replies with both > >>>>>>>>> sr_highest_slot and sr_target_highest_slot set to a > >>>>>>>>> larger value. > >>>>>>>>> --> The client can then use those slots with > >>>>>>>>> sa_sequenceid set to 1 for the first SEQUENCE > >>>>>>>>> operation on > >>>>>>>>> each of them. > >>>>>>>>> > >>>>>>>>> For shrinking the slot table... > >>>>>>>>> - Server/replier sends SEQUENCE replies with a smaller > >>>>>>>>> value for sr_target_highest_slot. > >>>>>>>>> - The server/replier waits for the client to do a SEQUENCE > >>>>>>>>> operation on one of the slot(s) where the server has > >>>>>>>>> replied > >>>>>>>>> with the smaller value for sr_target_highest_slot with > >>>>>>>>> a > >>>>>>>>> sa_highest_slot value <= to the new smaller > >>>>>>>>> sr_target_highest_slot > >>>>>>>>> - Once this happens, the server/replier can set > >>>>>>>>> sr_highest_slot > >>>>>>>>> to the lower value of sr_target_highest_slot and > >>>>>>>>> throw the > >>>>>>>>> slot table entries above that value away. > >>>>>>>>> --> Once the client sees a reply with sr_target_highest_slot > >>>>>>>>> set > >>>>>>>>> to the lower value, it should not do any additional > >>>>>>>>> SEQUENCE > >>>>>>>>> operations with a sa_slotid > sr_target_highest_slot > >>>>>>>>> > >>>>>>>>> Does the above sound correct? > >>>>>>>> > >>>>>>>> The above captures the case where the server is adjusting using > >>>>>>>> OP_SEQUENCE. However there is another potential mode where the > >>>>>>>> server > >>>>>>>> sends out a CB_RECALL_SLOT. > >>>>>>> Ouch. I completely forgot about this one and I'll admit the > >>>>>>> FreeBSD client > >>>>>>> doesn't have it implemented. > > Btw, I just coded this for the FreeBSD client and used a fake server > > to test it. I found that wireshark doesn't know how to decode the > > argument for CB_RECALL_SLOT, which is another hint that it is > > not being used. (It will take a while to get into releases.) > > > > I, personally, think CB_RECALL_SLOT is pretty useless, since it can only be used > > for sessions with backchannels (no sessionid argument). > > The primary reason for it is for managing RDMA credit resources used by > idle clients. If the client is sending no traffic, there are no > opportunities for the server to send back target slot changes. Since > RDMA credits consume significant memory and RDMA NIC-based resources, > releasing these, or sharing them more usefully without just closing > connections, becomes a big win. I'll take your word on this. I know nothing about RDMA and FreeBSD's NFS doesn't support RDMA channels. > > > >>>>> > >>>>> The client is free to refuse to return slots, but the penalty may be > >>>>> a forcible session disconnect. > >>>>> > >>>>> I agree you've captured the basics of the graceful-reduction > >>>>> scenario, > >>>>> but I do wonder if nconnect > 1 might impact the termination > >>>>> condition. > >>>>> > >>>>> Because nconnect may impact the ordering of request arrival at the > >>>>> server, it may be possible to have a timing window where one > >>>>> connection > >>>>> may signal a reduction while another connection's request is still > >>>>> outstanding? > >>>> > >>>> Not if the client is doing it right. It doesn't really matter which > >>>> connections were used, because the client is telling the server that "I > >>>> have now received all the replies I'm expecting from those slots". > >>>> > >>>> IOW: the client is supposed to wait to update the value of > >>>> sa_highest_slot in OP_SEQUENCE until it has actually received replies > >>>> for all RPC requests that were sent on the slot(s) being retired. > >>>> It shouldn't matter if there are duplicate requests or replies > >>>> outstanding since the client is expected to ignore those (and so the > >>>> server is indeed free to return NFS4ERR_BADSLOT if it has dropped the > >>>> cached reply). > >>>> > >>>> Now there is a danger if the server starts increasing the value of > >>>> sr_target_highest_slot before the client is done retiring slots. So > >>>> don't do that... > >>> Well, I think both you and Tom are correct, in a sense... > >>> Here is what RFC8881, sec. 2.10.6.1 says: > >>> > >>> The replier SHOULD retain the slots it wants to retire until the > >>> requester sends a request with a highest_slotid less than or equal > >>> to the replier's new enforced highest_slotid. > >>> > >>> I think the above is at least misleading and maybe outright incorrect. > >>> So, if the above were considered "correctly done", I think Tom is right. > >> > >> I think both Trond and I are right. :) In any event we're not disagreeing, > >> it's just thaty the client implementation needs to be careful. > >> If there are multiple forechannels, they all need to be taken > >> into consideration. The server doesn't have any protocol-specific > >> guarantee that the client has done so. Therefore it's on the client. > > All the client needs to do is not use the slots above the new target_highest. > > Pretty much, yes. > > > To me, it is the server that needs to be careful to not throw away the slots > > above target_highest before any RPCs issued by the client before the > > target_highest was lowered might still be in flight. > > Also correct. The slots can be retired (and freed by the server) when > the client reduces its slot highwater. True, but the server does have to be careful w.r.t. temporal ordering (or it must ensure that the SEQUENCE request with a small enough slot highwater was generated on the client after the client processed the target highwater in a SEQUENCE reply). For the SEQUENCE case: - It must see the small enough high water on a slot where the server had sent the new target highwater in a prior reply. For the CB_RECALL_SLOT case: - It must see the small enough high water on a slot where the server had sent a reply on the slot after receiving the NFS_OK reply to the CB_RECALL_SLOT. rick > > Tom. > > > > At least that is my current understanding of it, rick > > > >> > >>> I did the original post in part to see if others agreed that the server/replier > >>> must wait until it sees a SEQUENCE with sa_highest_slot <= the > >>> new sr_target_highest_slot on a slot where the new sr_target_highest_slot > >>> has been sent in a previous reply to SEQUENCE. (Without this additional > >>> requirement of "a slot where..." I think the SEQUENCE could be in an RPC > >>> that was generated before the client/requestor saw the new > >>> sr_target_highest_slot. > >>> > >>> I might post about this on nfsv4@xxxxxxxx, but I do not know if it could > >>> be changed as an errata? > >> > >> I'm not sure it's wrong, but it could perhaps be clarified if there is > >> an ambiguity that leads to a flawed implementation. Adding informative > >> text can be a slippery slope however, it can lead to new ambiguities. > >> Either way, it's an IETF matter. > >> > >> Tom. > >> > >>> > >>> Thanks for all the comments, rick > >>> > >>> > >>>> > >>>>> > >>>>> Tom. > >>>>> > >>>>> > >>>>>>> > >>>>>>> Just fyi, does the Linux server do this, or do I have some time > >>>>>>> to implement it? > >>>>>> > >>>>>> As far as I can tell, Linux NFSD does not yet implement > >>>>>> CB_RECALL_SLOT. > >>>> > >>>> No, but according to RFC 8881 Section 17, CB_RECALL_SLOT is labelled as > >>>> REQuired to implement if the client ever creates a back channel. So > >>>> other servers may expect it to be implemented. > >>>> > >>>>>> > >>>>>> > >>>>>>>> In the latter case, it is up to the client to send out enough > >>>>>>>> SEQUENCE > >>>>>>>> operations on the forward channel to implicitly acknowledges > >>>>>>>> the change > >>>>>>>> in slots using the sa_highestslot field (see RFC8881, Section > >>>>>>>> 20.8.3). > >>>>>>>> > >>>>>>>> If the client was completely idle when it received the > >>>>>>>> CB_RECALL_SLOT, > >>>>>>>> it should only need to send out 1 extra SEQUENCE op, but if > >>>>>>>> using RDMA, > >>>>>>>> then it has to keep pounding out "RDMA send" messages until the > >>>>>>>> RDMA > >>>>>>>> credit count has been brought down too. > >>>> > >>>> -- > >>>> Trond Myklebust > >>>> Linux NFS client maintainer, Hammerspace > >>>> trond.myklebust@xxxxxxxxxxxxxxx > >>>> > >>>> > >>> > >> > > >