On 7/21/23 13:04, Bob Pearson wrote: > If a send packet is dropped by the IP layer in rxe_requester() > the call to rxe_xmit_packet() can fail with err == -EAGAIN. > To recover, the state of the wqe is restored to the state before > the packet was sent so it can be resent. However, the routines > that save and restore the state miss a significnt part of the > variable state in the wqe, the dma struct which is used to process > through the sge table. And, the state is not saved before the packet > is built which modifies the dma struct. > > Under heavy stress testing with many QPs on a fast node sending > large messages to a slow node dropped packets are observed and > the resent packets are corrupted because the dma struct was not > restored. This patch fixes this behavior and allows the test cases > to succeed. > > Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer") > Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> > --- > drivers/infiniband/sw/rxe/rxe_req.c | 47 ++++++++++++++++------------- > 1 file changed, 26 insertions(+), 21 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c > index 76d85731a45f..10ff2715d9bb 100644 > --- a/drivers/infiniband/sw/rxe/rxe_req.c > +++ b/drivers/infiniband/sw/rxe/rxe_req.c > @@ -708,10 +708,11 @@ static void save_state(struct rxe_send_wqe *wqe, > struct rxe_send_wqe *rollback_wqe, > u32 *rollback_psn) > { > - rollback_wqe->state = wqe->state; > + rollback_wqe->state = wqe->state; > rollback_wqe->first_psn = wqe->first_psn; > - rollback_wqe->last_psn = wqe->last_psn; > - *rollback_psn = qp->req.psn; > + rollback_wqe->last_psn = wqe->last_psn; > + rollback_wqe->dma = wqe->dma; > + *rollback_psn = qp->req.psn; > } > > static void rollback_state(struct rxe_send_wqe *wqe, > @@ -719,10 +720,11 @@ static void rollback_state(struct rxe_send_wqe *wqe, > struct rxe_send_wqe *rollback_wqe, > u32 rollback_psn) > { > - wqe->state = rollback_wqe->state; > + wqe->state = rollback_wqe->state; > wqe->first_psn = rollback_wqe->first_psn; > - wqe->last_psn = rollback_wqe->last_psn; > - qp->req.psn = rollback_psn; > + wqe->last_psn = rollback_wqe->last_psn; > + wqe->dma = rollback_wqe->dma; > + qp->req.psn = rollback_psn; > } > > static void update_state(struct rxe_qp *qp, struct rxe_pkt_info *pkt) > @@ -922,6 +924,9 @@ int rxe_requester(struct rxe_qp *qp) > payload = mtu; > } > > + /* save wqe state before we build and send packet */ > + save_state(wqe, qp, &rollback_wqe, &rollback_psn); > + > skb = rxe_init_req_packet(qp, wqe, opcode, payload, &pkt); > if (unlikely(!skb)) { > rxe_err_qp(qp, "failed to create packet for %s", > @@ -929,29 +934,29 @@ int rxe_requester(struct rxe_qp *qp) > goto err; > } > > - /* > - * To prevent a race on wqe access between requester and completer, > - * wqe members state and psn need to be set before calling > - * rxe_xmit_packet(). > - * Otherwise, completer might initiate an unjustified retry flow. > - */ > - save_state(wqe, qp, &rollback_wqe, &rollback_psn); > + /* update wqe state as though we had sent it */ > update_wqe_state(qp, wqe, &pkt); > update_wqe_psn(qp, wqe, &pkt, payload); > > err = rxe_xmit_packet(qp, &pkt, skb); > if (err) { > - qp->need_req_skb = 1; > + if (err != -EAGAIN) { > + wqe->status = IB_WC_LOC_QP_OP_ERR; > + goto err; > + } > > + /* the packet was dropped so reset wqe to the state > + * before we sent it so we can try to resend > + */ > rollback_state(wqe, qp, &rollback_wqe, rollback_psn); > > - if (err == -EAGAIN) { > - rxe_sched_task(&qp->req.task); > - goto exit; > - } > + /* force a delay until the dropped packet is freed and > + * the send queue is drained below the low water mark > + */ > + qp->need_req_skb = 1; > > - wqe->status = IB_WC_LOC_QP_OP_ERR; > - goto err; > + rxe_sched_task(&qp->req.task); > + goto exit; > } > > update_state(qp, &pkt); Never mind. Missing some prereq patches.