On Fri, Jul 21, 2023 at 03:07:49PM -0500, Bob Pearson wrote: > If a send packet is dropped by the IP layer in rxe_requester() > the call to rxe_xmit_packet() can fail with err == -EAGAIN. > To recover, the state of the wqe is restored to the state before > the packet was sent so it can be resent. However, the routines > that save and restore the state miss a significnt part of the > variable state in the wqe, the dma struct which is used to process > through the sge table. And, the state is not saved before the packet > is built which modifies the dma struct. > > Under heavy stress testing with many QPs on a fast node sending > large messages to a slow node dropped packets are observed and > the resent packets are corrupted because the dma struct was not > restored. This patch fixes this behavior and allows the test cases > to succeed. > > Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer") > Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> > --- > v2: > Rebased to for-next > > drivers/infiniband/sw/rxe/rxe_req.c | 45 ++++++++++++++++------------- > 1 file changed, 25 insertions(+), 20 deletions(-) Applied to for-next, thanks Jason