Re: Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client

Bagas Sanjaya <bagasdotme@xxxxxxxxx> · Thu, 30 Nov 2023 14:36:32 +0700

On Thu, Nov 30, 2023 at 10:52:59AM +0530, Sukruth Sridharan (he/him) wrote:
> I notice the following hung task panic on 6.2.0-34 kernel during RDMA disconnect
> 
> [Wed Nov  1 08:03:54 2023] INFO: task kworker/u16:5:2274646 blocked
> for more than 120 seconds.
> [Wed Nov  1 08:03:55 2023]       Tainted: G        W  OE
> 6.2.0-34-generic #34-Ubuntu
> [Wed Nov  1 08:03:55 2023] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Nov  1 08:03:55 2023] task:kworker/u16:5   state:D stack:0
> pid:2274646 ppid:2      flags:0x00004000
> [Wed Nov  1 08:03:55 2023] Workqueue: xprtiod xprt_autoclose [sunrpc]
> [Wed Nov  1 08:03:55 2023] Call Trace:
> [Wed Nov  1 08:03:55 2023]  <TASK>
> [Wed Nov  1 08:03:55 2023]  __schedule+0x2aa/0x610
> [Wed Nov  1 08:03:55 2023]  schedule+0x63/0x110
> [Wed Nov  1 08:03:55 2023]  schedule_timeout+0x157/0x170
> [Wed Nov  1 08:03:55 2023]  wait_for_completion+0x88/0x150
> [Wed Nov  1 08:03:55 2023]  rpcrdma_xprt_disconnect+0x33f/0x350 [rpcrdma]
> [Wed Nov  1 08:03:55 2023]  xprt_rdma_close+0x12/0x40 [rpcrdma]
> [Wed Nov  1 08:03:55 2023]  xprt_autoclose+0x5c/0x120 [sunrpc]
> [Wed Nov  1 08:03:55 2023]  process_one_work+0x225/0x430
> [Wed Nov  1 08:03:55 2023]  worker_thread+0x50/0x3e0
> [Wed Nov  1 08:03:55 2023]  ? __pfx_worker_thread+0x10/0x10
> [Wed Nov  1 08:03:55 2023]  kthread+0xe9/0x110
> [Wed Nov  1 08:03:55 2023]  ? __pfx_kthread+0x10/0x10
> [Wed Nov  1 08:03:55 2023]  ret_from_fork+0x2c/0x50
> [Wed Nov  1 08:03:55 2023]  </TASK>
> 
> The flow which induced the bug is as follows:
> 1. Client initiates connection
> 2. Server hands off the response to the first RPC on the connection to
> the NIC (Mellanox ConnectX-5)
> 3. NIC tries to send the response around 6 times and fails the response with RNR
> 4. Client issues disconnect (possibly because it didn't receive a response)
> 5. Server cleans up the connection state
> 6. Client runs into the above panic as part of disconnect while draining the IOs
> 
> It looks like re_receiving is set only in rpcrdma_post_recvs, and the
> reason why it wouldn't be reset is if memory-region allocation code
> fails.
> That is possible if disconnect on the client somehow blocks allocation.
> 
> void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
> {
>         // ... (some initialization code)
> 
>     if (atomic_inc_return(&ep->re_receiving) > 1)
>         goto out;
> 
>         // ... (some allocation code)
> 
>     if (!wr) // <<<<<<<<<<<<<<<<<< PROBLEM HERE >>>>>>>>>>>>>>>>>>>
>         goto out;
> 
>         // ... (post recv code, and some error handling)
> 
>     if (atomic_dec_return(&ep->re_receiving) > 0)
>         complete(&ep->re_done);
> 
> out:
>     trace_xprtrdma_post_recvs(r_xprt, count);
>     ep->re_receive_count += count;
>     return;
> }
> 
> static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
> {
>     struct rpcrdma_ep *ep = r_xprt->rx_ep;
>     struct rdma_cm_id *id = ep->re_id;
> 
>     /* Wait for rpcrdma_post_recvs() to leave its critical
>      * section.
>      */
>     if (atomic_inc_return(&ep->re_receiving) > 1) //
> <<<<<<<<<<<<<<<<<<< This is not reset, so wait gets stuck
> >>>>>>>>>>>>>>>>>
>         wait_for_completion(&ep->re_done);
> 
>     /* Flush Receives, then wait for deferred Reply work
>      * to complete.
>      */
>     ib_drain_rq(id->qp);
> 
>     /* Deferred Reply processing might have scheduled
>      * local invalidations.
>      */
>     ib_drain_sq(id->qp);
> 
>     rpcrdma_ep_put(ep);
> }
> 
> Can you help conclude if the above theory around the bug being in the
> client code is right? If not, can you help with steps/data points
> required to debug this further?
> 

Can you verify that the bug still occurs with latest vanilla kernel
(currently v6.7-rc3)?

Thanks.

-- 
An old man doll... just what I always wanted! - Clara
Attachment:
signature.asc

Description: PGP signature