Re: [PATCH 1/1] NFSD: fix hang in nfsd4_shutdown_callback

Dai Ngo <dai.ngo@xxxxxxxxxx> · Wed, 29 Jan 2025 15:47:26 -0800

On 1/29/25 2:38 PM, Jeff Layton wrote:
On Wed, 2025-01-29 at 16:39 -0500, Chuck Lever wrote:
On 1/29/25 4:17 PM, Dai Ngo wrote:
If nfs4_client is in COURTESY state then there is no point to retry
the callback. This causes nfsd4_shutdown_callback to hang since
cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
notifies NFSD that the connection was closed.

This patch modifies nfsd4_cb_sequence_done to skip the restart the
RPC if nfs4_client is in  COURTESY state.

Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
---
  fs/nfsd/nfs4callback.c | 5 +++++
  1 file changed, 5 insertions(+)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 50e468bdb8d4..c90f94898cc5 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
  		ret = false;
  		break;
  	case 1:
+		if (clp->cl_state == NFSD4_COURTESY) {
+			nfsd4_mark_cb_fault(cb->cb_clp);
+			ret = false;
+			break;
+		}
We could do toss this operation here or at the top of
nfsd4_run_cb_work().

Like Olga, I'm wondering if this is the culprit on the recently
reported rpc_shutdown_client hangs.

I'm assuming that with a courtesy client that we don't want to do any
callbacks? If that's the case, then I think doing it early in
nfsd4_run_cb_work() would be better. That would prevent new cb's from
being queued until the cl_state changes too.

I'll move this check into nfsd4_run_cb_work.

Thanks Chuck and Jeff.
-Dai



  		/*
  		 * cb_seq_status remains 1 if an RPC Reply was never
  		 * received. NFSD can't know if the client processed

Nice catch though!