Re: [PATCH v1 2/8] xprtrdma: Do not post Receives after disconnect

Tom Talpey <tom@xxxxxxxxxx> · Wed, 31 Mar 2021 15:59:35 -0400

On 3/31/2021 3:36 PM, Chuck Lever wrote:
Currently the Receive completion handler refreshes the Receive Queue
whenever a successful Receive completion occurs.

On disconnect, xprtrdma drains the Receive Queue. The first few
Receive completions after a disconnect are typically successful,
until the first flushed Receive.

This means the Receive completion handler continues to post more
Receive WRs after the drain sentinel has been posted. The late-
posted Receives flush after the drain sentinel has completed,
leading to a crash later in rpcrdma_xprt_disconnect().

To prevent this crash, xprtrdma has to ensure that the Receive
handler stops posting Receives before ib_drain_rq() posts its
drain sentinel.

This patch is probably not sufficient to fully close that window,

"Probably" is not a word I'd like to use in a stable:cc...

but does significantly reduce the opportunity for a crash to
occur without incurring undue performance overhead.

Cc: stable@xxxxxxxxxxxxxxx # v5.7
Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
---
  net/sunrpc/xprtrdma/verbs.c |    7 +++++++
  1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index ec912cf9c618..1d88685badbe 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1371,8 +1371,10 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp)
  {
  	struct rpcrdma_buffer *buf = &r_xprt->rx_buf;
  	struct rpcrdma_ep *ep = r_xprt->rx_ep;
+	struct ib_qp_init_attr init_attr;
  	struct ib_recv_wr *wr, *bad_wr;
  	struct rpcrdma_rep *rep;
+	struct ib_qp_attr attr;
  	int needed, count, rc;
  
  	rc = 0;
@@ -1385,6 +1387,11 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp)
  	if (!temp)
  		needed += RPCRDMA_MAX_RECV_BATCH;
  
+	if (ib_query_qp(ep->re_id->qp, &attr, IB_QP_STATE, &init_attr))
+		goto out;

This call isn't completely cheap.

+	if (attr.qp_state == IB_QPS_ERR)
+		goto out;

But the QP is free to disconnect or go to error right now. This approach
just reduces the timing hole. Is it not possible to mark the WRs as
being part of a batch, and allowing them to flush? You could borrow a
bit in the completion cookie, and check it when the CQE pops out. Maybe.

+
  	/* fast path: all needed reps can be found on the free list */
  	wr = NULL;
  	while (needed) {