Re: [PATCH 1/1] NFSD: cancel CB_RECALL_ANY call when nfs4_client is about to be destroyed

Dai Ngo <dai.ngo@xxxxxxxxxx> · Fri, 29 Mar 2024 10:57:22 -0700

On 3/29/24 7:55 AM, Chuck Lever wrote:
On Thu, Mar 28, 2024 at 05:31:02PM -0700, Dai Ngo wrote:
On 3/28/24 11:14 AM, Dai Ngo wrote:
On 3/28/24 7:08 AM, Chuck Lever wrote:
On Wed, Mar 27, 2024 at 06:09:28PM -0700, Dai Ngo wrote:
On 3/26/24 11:27 AM, Chuck Lever wrote:
On Tue, Mar 26, 2024 at 11:13:29AM -0700, Dai Ngo wrote:
Currently when a nfs4_client is destroyed we wait for
the cb_recall_any
callback to complete before proceed. This adds
unnecessary delay to the
__destroy_client call if there is problem communicating
with the client.
By "unnecessary delay" do you mean only the seven-second RPC
retransmit timeout, or is there something else?
when the client network interface is down, the RPC task takes ~9s to
send the callback, waits for the reply and gets ETIMEDOUT. This process
repeats in a loop with the same RPC task before being stopped by
rpc_shutdown_client after client lease expires.
I'll have to review this code again, but rpc_shutdown_client
should cause these RPCs to terminate immediately and safely. Can't
we use that?
rpc_shutdown_client works, it terminated the RPC call to stop the loop.

It takes a total of about 1m20s before the CB_RECALL is terminated.
For CB_RECALL_ANY and CB_OFFLOAD, this process gets in to a infinite
loop since there is no delegation conflict and the client is allowed
to stay in courtesy state.

The loop happens because in nfsd4_cb_sequence_done if cb_seq_status
is 1 (an RPC Reply was never received) it calls nfsd4_mark_cb_fault
to set the NFSD4_CB_FAULT bit. It then sets cb_need_restart to true.
When nfsd4_cb_release is called, it checks cb_need_restart bit and
re-queues the work again.
Something in the sequence_done path should check if the server is
tearing down this callback connection. If it doesn't, that is a bug
IMO.
TCP terminated the connection after retrying for 16 minutes and
notified the RPC layer which deleted the nfsd4_conn.
The server should have closed this connection already.

Since there is no delegation conflict, the client remains in courtesy
state.

  Is it stuck
waiting for the client to respond to a FIN or something?

No, in this case since the client network interface was down server
TCP did not even receive ACK for the transmit so the server TCP
kept retrying.

But when nfsd4_run_cb_work ran again, it got into the infinite
loop caused by:
      /*
       * XXX: Ideally, we could wait for the client to
       *      reconnect, but I haven't figured out how
       *      to do that yet.
       */
       nfsd4_queue_cb_delayed(cb, 25);

which was introduced by c1ccfcf1a9bf. Note that I'm using 6.9-rc1.
The whole paragraph is:

1503         clnt = clp->cl_cb_client;
1504         if (!clnt) {
1505                 if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
1506                         nfsd41_destroy_cb(cb);
1507                 else {
1508                         /*
1509                          * XXX: Ideally, we could wait for the client to
1510                          *      reconnect, but I haven't figured out how
1511                          *      to do that yet.
1512                          */
1513                         nfsd4_queue_cb_delayed(cb, 25);
1514                 }
1515                 return;
1516         }

When there's no rpc_clnt and CB_KILL is set, the callback
operation should be destroyed immediately. CB_KILL is set by
nfsd4_shutdown_callback. It's only caller is
__destroy_client().

Why isn't NFSD4_CLIENT_CB_KILL set?

Since __destroy_client was not called, for the reason mentioned above,
nfsd4_shutdown_callback was not called so NFSD4_CLIENT_CB_KILL was not
set.

Since the nfsd callback_wq was created with alloc_ordered_workqueue,
once this loop happens the nfsd callback_wq is stuck since this workqueue
executes at most one work item at any given time.

This means that if a nfs client is shutdown and the server is in this
state then all other clients are effected; all callbacks to other
clients are stuck in the workqueue. And if a callback for a client is
stuck in the workqueue then that client can not unmount its shares.

This is the symptom that was reported recently on this list. I think
the nfsd callback_wq should be created as a normal workqueue that allows
multiple work items to be executed at the same time so a problem with
one client does not effect other clients.

-Dai