On 14/10/2022 10:35, Matsuda, Daisuke/松田 大輔 wrote:
On Thu, Oct 13, 2022 2:36 PM Li Zhijian wrote:
On 13/10/2022 09:47, Daisuke Matsuda wrote:
Requesting nodes do not handle a reported error correctly if it is
generated in the middle of multi-packet Read responses, and the node tries
to resend the request endlessly. Let completer terminate the connection in
that case.
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
---
FOR REVIEWERS:
I referred to IB Specification Vol 1-Revision-1.5 to make this patch.
Please see Ch.9.9.2.2, Ch.9.9.2.4.2 and Table 59.
drivers/infiniband/sw/rxe/rxe_comp.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index fb0c008af78c..c9170dd99f3a 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -200,6 +200,10 @@ static inline enum comp_state check_psn(struct rxe_qp *qp,
*/
if (pkt->psn == wqe->last_psn)
return COMPST_COMP_ACK;
+ else if (pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE &&
+ (qp->comp.opcode == IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST ||
+ qp->comp.opcode == IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE))
Hi Zhijian,
Thank you for taking a look.
When IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST or IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE will
be assigned to qp->comp.opcode ?
This happens in rxe_completer(). Upon receiving a Read reply, 'state' is changed in the following order:
[ 101.215593] rdma_rxe: qp#17 state = GET ACK
[ 101.216174] rdma_rxe: qp#17 state = GET WQE
[ 101.216776] rdma_rxe: qp#17 state = CHECK PSN
[ 101.217384] rdma_rxe: qp#17 state = CHECK ACK
[ 101.218008] rdma_rxe: qp#17 state = READ
[ 101.218556] rdma_rxe: qp#17 state = UPDATE COMP
[ 101.219188] rdma_rxe: qp#17 state = DONE
'qp->comp.opcode' is filled at COMPST_UPDATE_COMP, and the value is retained
until it is overridden by the next incoming reply.
Thanks for your explanation.
I wonder if "(pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE) " is enough ?
I am not very sure if it is better. I agree your suggestion is correct within the
IB transport layer, but RoCEv2 traffic is encapsulated in UDP/IP headers.
We may need to take it into consideration that UDP does not guarantee ordered
delivery of packets.
For example, responder can return coalesced ACKs for RDMA Write requests.
If ACK packets are swapped in the course of delivery, a packet with older psn can
arrive later, and probably it is to be discarded by returning COMPST_DONE here.
If we just put "(pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE) ",
I am afraid the swapped ACKs may go into wrong path. This is based on my
speculation, so please let me know if it is wrong.
Well, you current changes are quite fit to your patch subject :)
Reviewed-by: Li Zhijian <lizhijian@xxxxxxxxxxx>
Thanks,
Daisuke
Thanks
Zhijian
+ return COMPST_CHECK_ACK;
else
return COMPST_DONE;
} else if ((diff > 0) && (wqe->mask & WR_ATOMIC_OR_READ_MASK)) {
@@ -228,6 +232,10 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:
case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
+ /* Check NAK code to handle a remote error */
+ if (pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE)
+ break;
+
if (pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE &&
pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST) {
/* read retries of partial data may restart from