Re: [PATCH 2/8] nfsd: fix CB_SEQUENCE error handling of NFS4ERR_{BADSLOT,BADSESSION,SEQ_MISORDERED}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/23/25 3:25 PM, Jeff Layton wrote:
The current error handling has some problems:

BADSLOT and BADSESSION: don't release the slot before retrying the call

SEQ_MISORDERED: does some sketchy resetting of the seqid? I can't find any
recommendation about doing that in the spec, and it seems wrong.

Random thought: You might use the Linux NFS client's forechannel session
implementation as a code reference.


Handle all three errors the same way: release the slot, but then handle
it just like we would as if we hadn't gotten a reply; mark the session
as faulty, and retry the call.

Some questions:

Why does it matter whether NFSD keeps the slot if both sides plan to
destroy the session?

Also, AFAICT marking CB_FAULT does not destroy the session, it simply
tries to recreate backchannel's rpc_clnt. Perhaps NFSD's callback code
should actively destroy the session and let the client drive a fresh
CREATE_SESSION to recover?


Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
  fs/nfsd/nfs4callback.c | 27 +++++++++++----------------
  1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index e12205ef16ca932ffbcc86d67b0817aec2436c89..bfc9de1fcb67b4f05ed2f7a28038cd8290809c17 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1371,17 +1371,24 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
  		nfsd4_mark_cb_fault(cb->cb_clp);
  		ret = false;
  		break;
+	case -NFS4ERR_BADSESSION:
+	case -NFS4ERR_BADSLOT:
+	case -NFS4ERR_SEQ_MISORDERED:
+		/*
+		 * These errors indicate that something has gone wrong
+		 * with the server and client's synchronization. Release
+		 * the slot, but handle it as if we hadn't gotten a reply.
+		 */
+		nfsd41_cb_release_slot(cb);
+		fallthrough;
  	case 1:
  		/*
  		 * cb_seq_status remains 1 if an RPC Reply was never
  		 * received. NFSD can't know if the client processed
  		 * the CB_SEQUENCE operation. Ask the client to send a
-		 * DESTROY_SESSION to recover.
+		 * DESTROY_SESSION to recover, but keep the slot.
  		 */
-		fallthrough;
-	case -NFS4ERR_BADSESSION:
  		nfsd4_mark_cb_fault(cb->cb_clp);
-		ret = false;
  		goto need_restart;
  	case -NFS4ERR_DELAY:
  		cb->cb_seq_status = 1;
@@ -1390,14 +1397,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
rpc_delay(task, 2 * HZ);
  		return false;
-	case -NFS4ERR_BADSLOT:
-		goto retry_nowait;
-	case -NFS4ERR_SEQ_MISORDERED:
-		if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
-			session->se_cb_seq_nr[cb->cb_held_slot] = 1;
-			goto retry_nowait;
-		}
-		break;
  	default:
  		nfsd4_mark_cb_fault(cb->cb_clp);
  	}
@@ -1405,10 +1404,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
  	nfsd41_cb_release_slot(cb);
  out:
  	return ret;
-retry_nowait:
-	if (rpc_restart_call_prepare(task))
-		ret = false;
-	goto out;
  need_restart:
  	if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
  		trace_nfsd_cb_restart(clp, cb);



--
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux