On Tue, Jun 8, 2021 at 12:14 AM Pearson, Robert B <rpearsonhpe@xxxxxxxxx> wrote: > > > On 6/7/2021 6:12 AM, Zhu Yanjun wrote: > > On Mon, Jun 7, 2021 at 7:03 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > >> On Mon, Jun 07, 2021 at 04:16:37PM +0800, Zhu Yanjun wrote: > >>> On Sat, Jun 5, 2021 at 7:07 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > >>>> Currently the rdma_rxe driver attempts to protect atomic responder > >>>> resources by taking a reference to the qp which is only freed when the > >>>> resource is recycled for a new read or atomic operation. This means that > >>>> in normal circumstances there is almost always an extra qp reference > >>>> once an atomic operation has been executed which prevents cleaning up > >>>> the qp and associated pd and cqs when the qp is destroyed. > >>>> > >>>> This patch removes the call to rxe_add_ref() in send_atomic_ack() and the > >>>> call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is > >>> Not sure if it is a good way to fix this problem by removing the call > >>> to rxe_add_ref. > >>> Because taking a reference to the qp is to protect atomic responder resources. > >>> > >>> Removing rxe_add_ref is to decrease the protection of the atomic > >>> responder resources. > >> All those rxe_add_ref/rxe_drop_ref in RXE are horrid. It will be good to delete them all. > >> > > I made tests with this commit. After this commit is applied, this > > problem disappeared. > You were testing MW when you saw this bug. Does that mean that now MW is > working for you? Your MW patches are huge. After these patches are applied, I found 2 problems in my test environment. So IMO, can you send the test cases about MW to rdma-core? So we can verify these MW patches with them. In previous mails, you mentioned these MW test cases. Thanks a lot. Zhu Yanjun > > > > Zhu Yanjun > > > >> Thanks