I notice the following hung task panic on 6.2.0-34 kernel during RDMA disconnect [Wed Nov 1 08:03:54 2023] INFO: task kworker/u16:5:2274646 blocked for more than 120 seconds. [Wed Nov 1 08:03:55 2023] Tainted: G W OE 6.2.0-34-generic #34-Ubuntu [Wed Nov 1 08:03:55 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Nov 1 08:03:55 2023] task:kworker/u16:5 state:D stack:0 pid:2274646 ppid:2 flags:0x00004000 [Wed Nov 1 08:03:55 2023] Workqueue: xprtiod xprt_autoclose [sunrpc] [Wed Nov 1 08:03:55 2023] Call Trace: [Wed Nov 1 08:03:55 2023] <TASK> [Wed Nov 1 08:03:55 2023] __schedule+0x2aa/0x610 [Wed Nov 1 08:03:55 2023] schedule+0x63/0x110 [Wed Nov 1 08:03:55 2023] schedule_timeout+0x157/0x170 [Wed Nov 1 08:03:55 2023] wait_for_completion+0x88/0x150 [Wed Nov 1 08:03:55 2023] rpcrdma_xprt_disconnect+0x33f/0x350 [rpcrdma] [Wed Nov 1 08:03:55 2023] xprt_rdma_close+0x12/0x40 [rpcrdma] [Wed Nov 1 08:03:55 2023] xprt_autoclose+0x5c/0x120 [sunrpc] [Wed Nov 1 08:03:55 2023] process_one_work+0x225/0x430 [Wed Nov 1 08:03:55 2023] worker_thread+0x50/0x3e0 [Wed Nov 1 08:03:55 2023] ? __pfx_worker_thread+0x10/0x10 [Wed Nov 1 08:03:55 2023] kthread+0xe9/0x110 [Wed Nov 1 08:03:55 2023] ? __pfx_kthread+0x10/0x10 [Wed Nov 1 08:03:55 2023] ret_from_fork+0x2c/0x50 [Wed Nov 1 08:03:55 2023] </TASK> The flow which induced the bug is as follows: 1. Client initiates connection 2. Server hands off the response to the first RPC on the connection to the NIC (Mellanox ConnectX-5) 3. NIC tries to send the response around 6 times and fails the response with RNR 4. Client issues disconnect (possibly because it didn't receive a response) 5. Server cleans up the connection state 6. Client runs into the above panic as part of disconnect while draining the IOs It looks like re_receiving is set only in rpcrdma_post_recvs, and the reason why it wouldn't be reset is if memory-region allocation code fails. That is possible if disconnect on the client somehow blocks allocation. void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed, bool temp) { // ... (some initialization code) if (atomic_inc_return(&ep->re_receiving) > 1) goto out; // ... (some allocation code) if (!wr) // <<<<<<<<<<<<<<<<<< PROBLEM HERE >>>>>>>>>>>>>>>>>>> goto out; // ... (post recv code, and some error handling) if (atomic_dec_return(&ep->re_receiving) > 0) complete(&ep->re_done); out: trace_xprtrdma_post_recvs(r_xprt, count); ep->re_receive_count += count; return; } static void rpcrdma_xprt_drain(struct rpcrdma_xprt *r_xprt) { struct rpcrdma_ep *ep = r_xprt->rx_ep; struct rdma_cm_id *id = ep->re_id; /* Wait for rpcrdma_post_recvs() to leave its critical * section. */ if (atomic_inc_return(&ep->re_receiving) > 1) // <<<<<<<<<<<<<<<<<<< This is not reset, so wait gets stuck >>>>>>>>>>>>>>>>> wait_for_completion(&ep->re_done); /* Flush Receives, then wait for deferred Reply work * to complete. */ ib_drain_rq(id->qp); /* Deferred Reply processing might have scheduled * local invalidations. */ ib_drain_sq(id->qp); rpcrdma_ep_put(ep); } Can you help conclude if the above theory around the bug being in the client code is right? If not, can you help with steps/data points required to debug this further? Thanks in advance. Sukruth