On 1/22/2025 10:20 AM, Chuck Lever wrote:
On 1/22/25 10:10 AM, Jeff Layton wrote:
The v4.0 client always restarts the callback when the connection is shut
down (which is indicated by RPC_SIGNALLED()). The RPC is then requeued
and the result eventually should complete (or be aborted).
The v4.1 code instead processes the result and only later decides to
restart the call. Even more problematic is the fact that it releases the
slot beforehand. The restarted call may get a new slot, which would
could break DRC handling.
"break DRC handling" -- I'd like to understand this.
NFSD always sets cachethis to false in CB_SEQUENCE, so there is no DRC
for these operations. The only thing the client saves is the slot
sequence number IIUC.
Is retrying an uncached operation via a different slot a problem?
The server would ignore any in-progress status and therefore might
cause two requests to be processed in parallel. It might also
rearrange the order of replies. Other than those, maybe not!
I do think it's safest to not reallocate the slot in the 4.1 case.
The slot sequence number is only meaningful in the context of the
original slot.
Tom.
Change nfsd4_cb_sequence_done() such that RPC_SIGNALLED() is handled the
same way irrespective of the minorversion. Check it early, before
processing the CB_SEQUENCE result, and immediately restart the call.
It would be lovely to have some test cases that exercise the server's
backchannel retry logic.
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
fs/nfsd/nfs4callback.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index
50e468bdb8d4838b5217346dcc2bd0fec1765c1a..05afdf94c6478c7d698b059fa33dd9e7af49b91e 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1334,21 +1334,24 @@ static bool nfsd4_cb_sequence_done(struct
rpc_task *task, struct nfsd4_callback
struct nfsd4_session *session = clp->cl_cb_session;
bool ret = true;
- if (!clp->cl_minorversion) {
- /*
- * If the backchannel connection was shut down while this
- * task was queued, we need to resubmit it after setting up
- * a new backchannel connection.
- *
- * Note that if we lost our callback connection permanently
- * the submission code will error out, so we don't need to
- * handle that case here.
- */
- if (RPC_SIGNALLED(task))
- goto need_restart;
+ /*
+ * If the backchannel connection was shut down while this
+ * task was queued, resubmit it after setting up a new backchannel
+ * connection.
+ *
+ * If the backchannel connection is permanently lost, the submission
+ * code will error out, so there is no need to handle that case
here.
+ *
+ * For the v4.1+ case, do this without releasing the slot or doing
+ * any further processing. It's possible that the restarted call
will
+ * be a retransmission of an earlier call to the server and that
will
+ * help ensure that the DRC works correctly.
+ */
+ if (RPC_SIGNALLED(task))
+ goto need_restart;
+ if (!clp->cl_minorversion)
return true;
- }
if (cb->cb_held_slot < 0)
goto need_restart;
@@ -1403,9 +1406,6 @@ static bool nfsd4_cb_sequence_done(struct
rpc_task *task, struct nfsd4_callback
}
trace_nfsd_cb_free_slot(task, cb);
nfsd41_cb_release_slot(cb);
-
- if (RPC_SIGNALLED(task))
- goto need_restart;
out:
return ret;
retry_nowait:
---
base-commit: a8acd0de47f22619d62d548b86bcfc9a4de2b2c6
change-id: 20250122-nfsd-6-14-77c1ad5d9f01
Best regards,