The first available moment after cm_send_req where xprtrdma can post
Receives is when the RDMA core reports the QP connection has been
established.
This has significant implications to upper layer protocols. It
means that the listener cannot safely send the first message on
a new connection. This constrains the upper layer protocol to
be client/server style, with the active-side ULP first to send.
If the CM is changed in the way described here, peer-to-peer
protocols will be rendered unusable, as the passive side cannot
reliably send the first message since the connection will have
no receive posted. The MPA Extensions in RFC6581 discuss this,
and support peer-to-peer connection models with the "A" flag
(section 9.2) enabling passive-first ULPs.
Posting receives and awakening client processing as proposed below
does not close this race. A passive-side-first protocol will have
already begun to send, regardless of this rearrangement. It's an
inherent race and will not interoperate reliably.
Why change the CM API? The IB spec is not authoritative on this,
and there currently is no bug, right?
Tom.
On 8/17/2021 2:24 PM, Chuck Lever wrote:
Håkon Bugge points out that rdma_create_qp() is not supposed to
return a QP that is ready for Receives to be posted. It so happens
that ours does that, but the IBTA spec (12.9.7.1) states that a
transition to INIT happens only after REQ has been sent.
In future kernels, QPs returned from rdma_create_qp() might not be in
a state where posting Receives will succeed. This patch is a
pre-requisite to changing the legacy behavior of rdma_create_qp().
The first available moment after cm_send_req where xprtrdma can post
Receives is when the RDMA core reports the QP connection has been
established.
Note that xprtrdma has posted Receives just after rdma_create_qp()
since 8d4fb8ff427a ("xprtrdma: Fix disconnect regression"). To avoid
regressing 8d4fb8ff427a, xprtrdma needs to ensure that initial
Receive WRs are posted before pending RPCs are awoken. It appears
that the current logic does provide that guarantee.
Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> ---
net/sunrpc/xprtrdma/verbs.c | 12 ++++++------ 1 file changed, 6
insertions(+), 6 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c
b/net/sunrpc/xprtrdma/verbs.c index aaec3c9be8db..87ae62cdea18
100644 --- a/net/sunrpc/xprtrdma/verbs.c +++
b/net/sunrpc/xprtrdma/verbs.c @@ -520,12 +520,6 @@ int
rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
xprt_clear_connected(xprt); rpcrdma_reset_cwnd(r_xprt);
- /* Bump the ep's reference count while there are - * outstanding
Receives. - */ - rpcrdma_ep_get(ep); - rpcrdma_post_recvs(r_xprt, 1,
true); - rc = rdma_connect(ep->re_id, &ep->re_remote_cma); if (rc)
goto out; @@ -539,6 +533,12 @@ int rpcrdma_xprt_connect(struct
rpcrdma_xprt *r_xprt) goto out; }
+ /* Bump the ep's reference count while there are + * outstanding
Receives. + */ + rpcrdma_ep_get(ep); + rpcrdma_post_recvs(r_xprt, 1,
true); + rc = rpcrdma_sendctxs_create(r_xprt); if (rc) { rc =
-ENOTCONN;