Re: [PATCH v2 1/3] NFSD: mark cl_cb_state as NFSD4_CB_DOWN if cl_cb_client is NULL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 4/23/24 11:08 AM, Chuck Lever wrote:
On Tue, Apr 23, 2024 at 10:49:25AM -0700, Dai Ngo wrote:
On 4/23/24 6:41 AM, Chuck Lever wrote:
On Mon, Apr 22, 2024 at 08:12:31PM -0700, Dai Ngo wrote:
In nfsd4_run_cb_work if the rpc_clnt for the back channel is no longer
exists, the callback state in nfs4_client should be marked as NFSD4_CB_DOWN
so the server can notify the client to establish a new back channel
connection.

Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
---
   fs/nfsd/nfs4callback.c | 9 +++++++--
   1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index cf87ace7a1b0..f8bb5ff2e9ac 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1491,9 +1491,14 @@ nfsd4_run_cb_work(struct work_struct *work)
   	clnt = clp->cl_cb_client;
   	if (!clnt) {
-		if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
+		if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
   			nfsd41_destroy_cb(cb);
-		else {
+			clear_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags);
+
+			/* let client knows BC is down when it reconnects */
+			clear_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags);
+			nfsd4_mark_cb_down(clp);
+		} else {
   			/*
   			 * XXX: Ideally, we could wait for the client to
   			 *	reconnect, but I haven't figured out how
NFSD4_CLIENT_CB_KILL is for when the lease is getting expunged. It's
not supposed to be used when only the transport is closed.
The reason NFSD4_CLIENT_CB_KILL needs to be set when the transport is
closed is because of commit c1ccfcf1a9bf3.

When the transport is closed, nfsd4_conn_lost is called which then calls
nfsd4_probe_callback to set NFSD4_CLIENT_CB_UPDATE and schedule cl_cb_null
work to activate the callback worker (nfsd4_run_cb_work) to do the update.

Callback worker calls nfsd4_process_cb_update to do rpc_shutdown_client
then clear cl_cb_client.

When nfsd4_process_cb_update returns to nfsd4_run_cb_work, if cl_cb_client
is NULL and NFSD4_CLIENT_CB_KILL not set then it re-queues the callback,
causing an infinite loop.
That's the way it is supposed to work today. The callback is
re-queued until the client reconnects, at which point the loop is
broken.

As you mentioned below, this needs to be reworked.

What if the client never comes back, decommissioned or student hibernates
the laptop and opens it up few days later. Even when the client comes back,
it might have been rebooted so the callback does not mean anything to it.



Thus, shouldn't you mark_cb_down in this arm, instead?
I'm not clear what you mean here, the callback worker calls
nfsd4_mark_cb_down after destroying the callback.
No, I mean in the re-queue case.

In the case of re-queue, the back channel is already marked as NFSD4_CB_DOWN
and cl_flags is NFSD4_CLIENT_STABLE|NFSD4_CLIENT_RECLAIM_COMPLETE|NFSD4_CLIENT_CONFIRMED:

Apr 23 08:07:23 nfsvmc14 kernel: nfsd4_run_cb_work: NULL cl_cb_client REQUEUE CB cb[ffff888126e8a728] clp[ffff888126e8a430] cl_cb_state[2] cl_flags[0x1c]

but that does not stop the loop.


Even so, isn't the
backchannel already marked down when we get here?
No, according to my testing. Without marking the back channel down the
client does not re-establish the back channel when it reconnects.
I didn't expect that closing the transport on the server side would
need any changes in fs/nfsd/nfs4callback.c. Let me get the
backchannel retransmit behavior sorted first. I'm still working on
setting up a test rig here.

Thanks, I will wait until you sort this out.

-Dai







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux