Re: [PATCH for-next] RDMA/rxe: Fix qp reference counting for atomic ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 8, 2021 at 10:01 AM Pearson, Robert B <rpearsonhpe@xxxxxxxxx> wrote:
>
>
> On 6/7/2021 8:39 PM, Zhu Yanjun wrote:
> > On Tue, Jun 8, 2021 at 12:14 AM Pearson, Robert B <rpearsonhpe@xxxxxxxxx> wrote:
> >>
> >> On 6/7/2021 6:12 AM, Zhu Yanjun wrote:
> >>> On Mon, Jun 7, 2021 at 7:03 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >>>> On Mon, Jun 07, 2021 at 04:16:37PM +0800, Zhu Yanjun wrote:
> >>>>> On Sat, Jun 5, 2021 at 7:07 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:
> >>>>>> Currently the rdma_rxe driver attempts to protect atomic responder
> >>>>>> resources by taking a reference to the qp which is only freed when the
> >>>>>> resource is recycled for a new read or atomic operation. This means that
> >>>>>> in normal circumstances there is almost always an extra qp reference
> >>>>>> once an atomic operation has been executed which prevents cleaning up
> >>>>>> the qp and associated pd and cqs when the qp is destroyed.
> >>>>>>
> >>>>>> This patch removes the call to rxe_add_ref() in send_atomic_ack() and the
> >>>>>> call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is
> >>>>> Not sure if it is a good way to fix this problem by removing the call
> >>>>> to rxe_add_ref.
> >>>>> Because taking a reference to the qp is to protect atomic responder resources.
> >>>>>
> >>>>> Removing rxe_add_ref is to decrease the protection of the atomic
> >>>>> responder resources.
> >>>> All those rxe_add_ref/rxe_drop_ref in RXE are horrid. It will be good to delete them all.
> >>>>
> >>> I made tests with this commit. After this commit is applied, this
> >>> problem disappeared.
> >> You were testing MW when you saw this bug. Does that mean that now MW is
> >> working for you?
> > Your MW patches are huge. After these patches are applied, I found 2
> > problems in my test environment.
>
> The trace you showed looked like the pyverbs tests all passed and then
> there were leaked QP/PD/CQ. I also saw those. After fixing the QP
> reference count bug (not in MW) I did not see any errors from the
> pyverbs tests of MW. Or any other errors for that matter. What was the
> other problem? Was that the memory barrier one (also not in MW)?
>
> Mostly I want to know if you currently see any errors in the kernel
> related to MW. The test case bug (in test_qpex.py) is a separate issue

The current test cases in rdma-core just confirm a regression in RXE.

Zhu Yanjun

> that is not a rxe bug at all.
>
> Bob
>
> > So IMO, can you send the test cases about MW to rdma-core? So we can
> > verify these MW patches with them.
> >
> > In previous mails, you mentioned these MW test cases.
> >
> > Thanks a lot.
> > Zhu Yanjun
> >
> >>> Zhu Yanjun
> >>>
> >>>> Thanks



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux