On 6/4/21 6:05 PM, Bob Pearson wrote: > Currently the rdma_rxe driver attempts to protect atomic responder > resources by taking a reference to the qp which is only freed when the > resource is recycled for a new read or atomic operation. This means that > in normal circumstances there is almost always an extra qp reference > once an atomic operation has been executed which prevents cleaning up > the qp and associated pd and cqs when the qp is destroyed. > > This patch removes the call to rxe_add_ref() in send_atomic_ack() and the > call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is > destroyed while a peer is retrying an atomic op it will cause the > operation to fail which is acceptable. > > Reported-by: Zhu Yanjun <zyjzyj2000@xxxxxxxxx> > Fixes: 86af61764151 ("IB/rxe: remove unnecessary skb_clone") > Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> > --- > drivers/infiniband/sw/rxe/rxe_qp.c | 1 - > drivers/infiniband/sw/rxe/rxe_resp.c | 2 -- > 2 files changed, 3 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c > index 34ae957a315c..b6d83d82e4f9 100644 > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > @@ -136,7 +136,6 @@ static void free_rd_atomic_resources(struct rxe_qp *qp) > void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res) > { > if (res->type == RXE_ATOMIC_MASK) { > - rxe_drop_ref(qp); > kfree_skb(res->atomic.skb); > } else if (res->type == RXE_READ_MASK) { > if (res->read.mr) > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > index 2b220659bddb..39dc39be586e 100644 > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > @@ -966,8 +966,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt, > goto out; > } > > - rxe_add_ref(qp); > - > res = &qp->resp.resources[qp->resp.res_head]; > free_rd_atomic_resource(qp, res); > rxe_advance_resp_resource(qp); > There is another way to fix this which is to add cleanup_rd_atomic_resources to rxe_qp_destroy() but it has the exact same behavior because it removes the responder resource causing a retried operation to fail. You can't add it to the qp cleanup routines because you never get there until the ref count goes to zero. I think the above is the cleanest solution. Bob