Re: resizing slot tables for sessions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/11/2025 7:26 AM, Rick Macklem wrote:
On Mon, Feb 10, 2025 at 11:11 AM Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:

On Mon, 2025-02-10 at 13:07 -0500, Tom Talpey wrote:
On 2/10/2025 8:52 AM, Chuck Lever wrote:
On 2/9/25 8:34 PM, Rick Macklem wrote:
On Sun, Feb 9, 2025 at 3:34 PM Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:

On Sun, 2025-02-09 at 13:39 -0800, Rick Macklem wrote:
Hi,

I thought I'd post here instead of nfsv4@xxxxxxxx since I
think the Linux server has been implementing this recently.

I am not interested in making the FreeBSD NFSv4.1/4.2
server dynamically resize slot tables in sessions, but I do
want to make sure the FreeBSD handles this case correctly.

Here is what I believe is supposed to be done:
For growing the slot table...
- Server/replier sends SEQUENCE replies with both
     sr_highest_slot and sr_target_highest_slot set to a
larger value.
--> The client can then use those slots with
        sa_sequenceid set to 1 for the first SEQUENCE
operation on
        each of them.

For shrinking the slot table...
- Server/replier sends SEQUENCE replies with a smaller
    value for sr_target_highest_slot.
    - The server/replier waits for the client to do a SEQUENCE
       operation on one of the slot(s) where the server has
replied
       with the smaller value for sr_target_highest_slot with
a
       sa_highest_slot value <= to the new smaller
        sr_target_highest_slot
       - Once this happens, the server/replier can set
sr_highest_slot
          to the lower value of sr_target_highest_slot and
throw the
           slot table entries above that value away.
--> Once the client sees a reply with sr_target_highest_slot
set
        to the lower value, it should not do any additional
SEQUENCE
        operations with a sa_slotid > sr_target_highest_slot

Does the above sound correct?

The above captures the case where the server is adjusting using
OP_SEQUENCE. However there is another potential mode where the
server
sends out a CB_RECALL_SLOT.
Ouch. I completely forgot about this one and I'll admit the
FreeBSD client
doesn't have it implemented.

The client is free to refuse to return slots, but the penalty may be
a forcible session disconnect.

I agree you've captured the basics of the graceful-reduction
scenario,
but I do wonder if nconnect > 1 might impact the termination
condition.

Because nconnect may impact the ordering of request arrival at the
server, it may be possible to have a timing window where one
connection
may signal a reduction while another connection's request is still
outstanding?

Not if the client is doing it right. It doesn't really matter which
connections were used, because the client is telling the server that "I
have now received all the replies I'm expecting from those slots".

IOW: the client is supposed to wait to update the value of
sa_highest_slot in OP_SEQUENCE until it has actually received replies
for all RPC requests that were sent on the slot(s) being retired.
It shouldn't matter if there are duplicate requests or replies
outstanding since the client is expected to ignore those (and so the
server is indeed free to return NFS4ERR_BADSLOT if it has dropped the
cached reply).

Now there is a danger if the server starts increasing the value of
sr_target_highest_slot before the client is done retiring slots. So
don't do that...
Well, I think both you and Tom are correct, in a sense...
Here is what RFC8881, sec. 2.10.6.1 says:

      The replier SHOULD retain the slots it wants to retire until the
       requester sends a request with a highest_slotid less than or equal
       to the replier's new enforced highest_slotid.

I think the above is at least misleading and maybe outright incorrect.
So, if the above were considered "correctly done", I think Tom is right.

I think both Trond and I are right. :) In any event we're not disagreeing,
it's just thaty the client implementation needs to be careful.
If there are multiple forechannels, they all need to be taken
into consideration. The server doesn't have any protocol-specific
guarantee that the client has done so. Therefore it's on the client.

I did the original post in part to see if others agreed that the server/replier
must wait until it sees a SEQUENCE with sa_highest_slot <= the
new sr_target_highest_slot on a slot where the new sr_target_highest_slot
has been sent in a previous reply to SEQUENCE. (Without this additional
requirement of "a slot where..." I think the SEQUENCE could be in an RPC
that was generated before the client/requestor saw the new
sr_target_highest_slot.

I might post about this on nfsv4@xxxxxxxx, but I do not know if it could
be changed as an errata?

I'm not sure it's wrong, but it could perhaps be clarified if there is
an ambiguity that leads to a flawed implementation. Adding informative
text can be a slippery slope however, it can lead to new ambiguities.
Either way, it's an IETF matter.

Tom.


Thanks for all the comments, rick




Tom.



Just fyi, does the Linux server do this, or do I have some time
to implement it?

As far as I can tell, Linux NFSD does not yet implement
CB_RECALL_SLOT.

No, but according to RFC 8881 Section 17, CB_RECALL_SLOT is labelled as
REQuired to implement if the client ever creates a back channel. So
other servers may expect it to be implemented.



In the latter case, it is up to the client to send out enough
SEQUENCE
operations on the forward channel to implicitly acknowledges
the change
in slots using the sa_highestslot field (see RFC8881, Section
20.8.3).

If the client was completely idle when it received the
CB_RECALL_SLOT,
it should only need to send out 1 extra SEQUENCE op, but if
using RDMA,
then it has to keep pounding out "RDMA send" messages until the
RDMA
credit count has been brought down too.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx








[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux