Re: Hung task panic as part of NFS RDMA Disconnect due to possible bug on 6.2.0-34-generic client

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Thu, 30 Nov 2023 14:08:04 +0000

> On Nov 30, 2023, at 12:22 AM, Sukruth Sridharan (he/him) <susridharan@xxxxxxxxxxxxxxx> wrote:
> 
> I notice the following hung task panic on 6.2.0-34 kernel during RDMA disconnect
> 
> [Wed Nov  1 08:03:54 2023] INFO: task kworker/u16:5:2274646 blocked
> for more than 120 seconds.
> [Wed Nov  1 08:03:55 2023]       Tainted: G        W  OE
> 6.2.0-34-generic #34-Ubuntu
> [Wed Nov  1 08:03:55 2023] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Nov  1 08:03:55 2023] task:kworker/u16:5   state:D stack:0
> pid:2274646 ppid:2      flags:0x00004000
> [Wed Nov  1 08:03:55 2023] Workqueue: xprtiod xprt_autoclose [sunrpc]
> [Wed Nov  1 08:03:55 2023] Call Trace:
> [Wed Nov  1 08:03:55 2023]  <TASK>
> [Wed Nov  1 08:03:55 2023]  __schedule+0x2aa/0x610
> [Wed Nov  1 08:03:55 2023]  schedule+0x63/0x110
> [Wed Nov  1 08:03:55 2023]  schedule_timeout+0x157/0x170
> [Wed Nov  1 08:03:55 2023]  wait_for_completion+0x88/0x150
> [Wed Nov  1 08:03:55 2023]  rpcrdma_xprt_disconnect+0x33f/0x350 [rpcrdma]
> [Wed Nov  1 08:03:55 2023]  xprt_rdma_close+0x12/0x40 [rpcrdma]
> [Wed Nov  1 08:03:55 2023]  xprt_autoclose+0x5c/0x120 [sunrpc]
> [Wed Nov  1 08:03:55 2023]  process_one_work+0x225/0x430
> [Wed Nov  1 08:03:55 2023]  worker_thread+0x50/0x3e0
> [Wed Nov  1 08:03:55 2023]  ? __pfx_worker_thread+0x10/0x10
> [Wed Nov  1 08:03:55 2023]  kthread+0xe9/0x110
> [Wed Nov  1 08:03:55 2023]  ? __pfx_kthread+0x10/0x10
> [Wed Nov  1 08:03:55 2023]  ret_from_fork+0x2c/0x50
> [Wed Nov  1 08:03:55 2023]  </TASK>

Hi Sukruth -

This isn't a panic, fortunately. It's simply a report that
the task is not making progress. More below...

> The flow which induced the bug is as follows:
> 1. Client initiates connection
> 2. Server hands off the response to the first RPC on the connection to
> the NIC (Mellanox ConnectX-5)
> 3. NIC tries to send the response around 6 times and fails the response with RNR
> 4. Client issues disconnect (possibly because it didn't receive a response)
> 5. Server cleans up the connection state
> 6. Client runs into the above panic as part of disconnect while draining the IOs
> 
> It looks like re_receiving is set only in rpcrdma_post_recvs, and the
> reason why it wouldn't be reset is if memory-region allocation code
> fails.
> That is possible if disconnect on the client somehow blocks allocation.
> 
> void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp)
> {
>        // ... (some initialization code)
> 
>    if (atomic_inc_return(&ep->re_receiving) > 1)
>        goto out;
> 
>        // ... (some allocation code)
> 
>    if (!wr) // <<<<<<<<<<<<<<<<<< PROBLEM HERE >>>>>>>>>>>>>>>>>>>
>        goto out;
> 
>        // ... (post recv code, and some error handling)
> 
>    if (atomic_dec_return(&ep->re_receiving) > 0)
>        complete(&ep->re_done);
> 
> out:
>    trace_xprtrdma_post_recvs(r_xprt, count);
>    ep->re_receive_count += count;
>    return;
> }
> 
> static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt)
> {
>    struct rpcrdma_ep *ep = r_xprt->rx_ep;
>    struct rdma_cm_id *id = ep->re_id;
> 
>    /* Wait for rpcrdma_post_recvs() to leave its critical
>     * section.
>     */
>    if (atomic_inc_return(&ep->re_receiving) > 1) //
> <<<<<<<<<<<<<<<<<<< This is not reset, so wait gets stuck
>>>>>>>>>>>>>>>>>> 
>        wait_for_completion(&ep->re_done);
> 
>    /* Flush Receives, then wait for deferred Reply work
>     * to complete.
>     */
>    ib_drain_rq(id->qp);
> 
>    /* Deferred Reply processing might have scheduled
>     * local invalidations.
>     */
>    ib_drain_sq(id->qp);
> 
>    rpcrdma_ep_put(ep);
> }
> 
> Can you help conclude if the above theory around the bug being in the
> client code is right? If not, can you help with steps/data points
> required to debug this further?

Try applying:

895cedc17919 ("xprtrdma: Remap Receive buffers after a reconnect")

--
Chuck Lever