Re: [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC_SIGNALLED is set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/26/25 6:18 AM, Jeff Layton wrote:
On Sun, 2025-01-26 at 10:01 +1100, NeilBrown wrote:
On Fri, 24 Jan 2025, Jeff Layton wrote:
This is problematic, since the RPC might have been entirely successful.
There is no point in restarting a v4.1+ callback just because
RPC_SIGNALLED is true. The v4.1+ error handling has other mechanisms for
detecting when it should retransmit the RPC.

But why might RPC_SIGNALLED() ever be true?
The flag RPC_TASK_SIGNALLED is only ever set by rpc_signal_task() which
is only called from rpc_killall_tasks() and __rpc_execute() for
non-async tasks which doesn't apply to nfsd callbacks as they are
started with rpc_call_async().

rpc_killall_tasks() is called by fs/nfs/ which isn't relevant for us,
and from rpc_shutdown_client().  In those cases we certainly don't want
the request to be retried, though the nfsd4_process_cb_update() case is
a little interesting as we do want it to be retried, but in a different
client.

So the code you are removing is either dead code because something else
will prevent the restart when a client is being shut down, or it is bad
code because it would delay rpc_shutdown_client() while the request is
retried.

I haven't dug the extra step to figure out which, but either way I think
the code should go.

Thanks. That was my analysis too.

Agreed, this code is problematic, but it appears to me that some of
these problems are not resolved by simply removing the signal check.


rpc_shutdown_client() is called when we tear down and rebuild the
rpc_client. nfsd does this in setup_callback_client(), which gets
called from nfsd4_process_cb_update() (basically when we detect that
the backchannel is having problems).

There are really only two states: We either got a reply from the server
before the client went down, or we didn't. In the case where we got a
reply, there is no need to retry anything. In the case where we didn't,
the tk_status will be '1', so there is no need to check RPC_SIGNALLED()
at all here.

Fwiw, the "cb_seq_status == 1" arm skips the signal check in the current
code.


The existing code could lead to the call being retried when we had
already gotten a perfectly valid reply.

Here's a case-by-case audit:

 - NFS_OK: SEQUENCE was decoded and passed sanity checks. So this should
   not ever requeue in here. It might be requeued during subsequent
   processing.

 - ESERVERFAULT: SEQUENCE was decoded but failed sanity checking. The
   reply should be dropped now, and the session marked FAULT. No requeue
   is ever needed here.

   [ I question whether the sequence number should be bumped in this
     case -- the client's callback server replied with the identity of
     some other slot. And anyway, this session is about to become
     toast. ]

 - 1: The timeout case. We want a fresh session here, so it falls
   through to BADSESSION.

 - NFS4ERR_BADSESSION: This needs a requeue so that the operation can
   be retried with a fresh session. But it should always check if the
   rpc_clnt is shutting down before doing so. This is a problem in the
   current code.

 - NFS4ERR_DELAY: Skips the signal check, but shouldn't. If the rpc_clnt
   is shutting down, this RPC should not be requeued.

 - NFS4ERR_BAD_SLOT: Skips the signal check, but shouldn't. I need to
   think more about BAD_SLOT recovery best practice.

 - NFS4ERR_SEQ_MISORDERED: Does one retry with a seq_nr of 1. It
   probably should terminate if that fails. IMO this should check for
   rpc_clnt shutdown before requeuing the retry.

 - default: I don't think this case should ever be requeued, but it
   appears that it could be if the rpc_clnt is shutting down.


Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
  fs/nfsd/nfs4callback.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..e12205ef16ca932ffbcc86d67b0817aec2436c89 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1403,9 +1403,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
  	}
  	trace_nfsd_cb_free_slot(task, cb);
  	nfsd41_cb_release_slot(cb);
-
-	if (RPC_SIGNALLED(task))
-		goto need_restart;
  out:
  	return ret;
  retry_nowait:

--
2.48.1






--
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux