Currently the Receive completion handler refreshes the Receive Queue whenever a successful Receive completion occurs. On disconnect, xprtrdma drains the Receive Queue. The first few Receive completions after a disconnect are typically successful, until the first flushed Receive. This means the Receive completion handler continues to post more Receive WRs after the drain sentinel has been posted. The late- posted Receives flush after the drain sentinel has completed, leading to a crash later in rpcrdma_xprt_disconnect(). To prevent this crash, xprtrdma has to ensure that the Receive handler stops posting Receives before ib_drain_rq() posts its drain sentinel. This patch is probably not sufficient to fully close that window, but does significantly reduce the opportunity for a crash to occur without incurring undue performance overhead. Cc: stable@xxxxxxxxxxxxxxx # v5.7 Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> --- net/sunrpc/xprtrdma/verbs.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index ec912cf9c618..1d88685badbe 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1371,8 +1371,10 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp) { struct rpcrdma_buffer *buf = &r_xprt->rx_buf; struct rpcrdma_ep *ep = r_xprt->rx_ep; + struct ib_qp_init_attr init_attr; struct ib_recv_wr *wr, *bad_wr; struct rpcrdma_rep *rep; + struct ib_qp_attr attr; int needed, count, rc; rc = 0; @@ -1385,6 +1387,11 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp) if (!temp) needed += RPCRDMA_MAX_RECV_BATCH; + if (ib_query_qp(ep->re_id->qp, &attr, IB_QP_STATE, &init_attr)) + goto out; + if (attr.qp_state == IB_QPS_ERR) + goto out; + /* fast path: all needed reps can be found on the free list */ wr = NULL; while (needed) {