Re: [PATCH for-next] RDMA/rxe: Fix qp reference counting for atomic ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/7/2021 8:39 PM, Zhu Yanjun wrote:
On Tue, Jun 8, 2021 at 12:14 AM Pearson, Robert B <rpearsonhpe@xxxxxxxxx> wrote:

On 6/7/2021 6:12 AM, Zhu Yanjun wrote:
On Mon, Jun 7, 2021 at 7:03 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
On Mon, Jun 07, 2021 at 04:16:37PM +0800, Zhu Yanjun wrote:
On Sat, Jun 5, 2021 at 7:07 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:
Currently the rdma_rxe driver attempts to protect atomic responder
resources by taking a reference to the qp which is only freed when the
resource is recycled for a new read or atomic operation. This means that
in normal circumstances there is almost always an extra qp reference
once an atomic operation has been executed which prevents cleaning up
the qp and associated pd and cqs when the qp is destroyed.

This patch removes the call to rxe_add_ref() in send_atomic_ack() and the
call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is
Not sure if it is a good way to fix this problem by removing the call
to rxe_add_ref.
Because taking a reference to the qp is to protect atomic responder resources.

Removing rxe_add_ref is to decrease the protection of the atomic
responder resources.
All those rxe_add_ref/rxe_drop_ref in RXE are horrid. It will be good to delete them all.

I made tests with this commit. After this commit is applied, this
problem disappeared.
You were testing MW when you saw this bug. Does that mean that now MW is
working for you?
Your MW patches are huge. After these patches are applied, I found 2
problems in my test environment.

The trace you showed looked like the pyverbs tests all passed and then there were leaked QP/PD/CQ. I also saw those. After fixing the QP reference count bug (not in MW) I did not see any errors from the pyverbs tests of MW. Or any other errors for that matter. What was the other problem? Was that the memory barrier one (also not in MW)?

Mostly I want to know if you currently see any errors in the kernel related to MW. The test case bug (in test_qpex.py) is a separate issue that is not a rxe bug at all.

Bob

So IMO, can you send the test cases about MW to rdma-core? So we can
verify these MW patches with them.

In previous mails, you mentioned these MW test cases.

Thanks a lot.
Zhu Yanjun

Zhu Yanjun

Thanks



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux