RE: [PATCH for-next v13 00/10] Fix race conditions in rxe_pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Zhu,

Sorry. I am not trying to imply you are against this. Just that you are more aware of the
current outstanding bugs reported.

Bob

-----Original Message-----
From: Zhu Yanjun <zyjzyj2000@xxxxxxxxx> 
Sent: Wednesday, April 20, 2022 9:13 PM
To: Bob Pearson <rpearsonhpe@xxxxxxxxx>
Cc: Jason Gunthorpe <jgg@xxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>
Subject: Re: [PATCH for-next v13 00/10] Fix race conditions in rxe_pool

On Thu, Apr 21, 2022 at 7:04 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:
>
> On 4/8/22 13:06, Jason Gunthorpe wrote:
> > On Mon, Apr 04, 2022 at 04:50:50PM -0500, Bob Pearson wrote:
> >> There are several race conditions discovered in the current 
> >> rdma_rxe driver.  They mostly relate to races between normal 
> >> operations and destroying objects.  This patch series
> >>  - Makes several minor cleanups in rxe_pool.[ch]
> >>  - Adds wait for completions to the paths in verbs APIs which destroy
> >>    objects.
> >>  - Changes read side locking to rcu.
> >>  - Moves object cleanup code to after ref count is zero
> >
> > This all seems fine to me now, except for the question about the 
> > tasklets
> >
> > Thanks,
> > Jason
>
> There has been a long delay because of the mr = NULL bug and the 
> locking problems. With the following patches applied (last to first) I 
> do not see any lockdep warnings, seg faults or anything else in dmesg 
> for long runs of
>
>         pyverbs
>         perftests (ib_xxx_bw, ib_xxx_lat)
>         rping (node to node)
>         blktests (srp)
>
> These patches were in v13 of the "Fix race conditions" patch. I will send v14 today.
> 8d342cb8d7ce RDMA/rxe: Cleanup rxe_pool.c
>
> 6e4c52e04bc9 RDMA/rxe: Convert read side locking to rcu
>
> e3e46d864b98 RDMA/rxe: Stop lookup of partially built objects
>
> e1fb6b7225d0 RDMA/rxe: Enforce IBA C11-17
>
> 2607d042376f RDMA/rxe: Move mw cleanup code to rxe_mw_cleanup()
>
> ca082913b915 RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()
>
> 394f24ebc81b RDMA/rxe: Move qp cleanup code to rxe_qp_do_cleanup()
>
>
> 3fb445b66e5c RDMA/rxe: Add rxe_srq_cleanup()
>
> 4730b0ed751a RDMA/rxe: Remove IB_SRQ_INIT_MASK
>
>
> These patches are already submitted
> d02e7a7266cf RDMA/rxe: Fix "RDMA/rxe: Cleanup rxe_mcast.c"
>
> 569aba28f67c RDMA/rxe: Fix "Replace mr by rkey in responder resources(2)"
>  or whatever you called it.
> 5e74a5ecfb53 RDMA/rxe: Fix "Replace mr by rkey in responder resources"
>
> 007493744865 RDMA/rxe: Fix typo: replace paylen by payload
>
>
> This patch was submitted to scsi by Bart and addressed long timeouts 
> that were not rxe related (same issue also happens with siw)
> cdd844a1ba45 Revert "scsi: scsi_debug: Address races following module load"
>
> If Zhu is not OK with this let know what bugs remain that need fixing.

How do you get this conclusion that I am not OK with this?

Zhu Yanjun

>
> Bob




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux