On Thu, Apr 21, 2022 at 7:04 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > > On 4/8/22 13:06, Jason Gunthorpe wrote: > > On Mon, Apr 04, 2022 at 04:50:50PM -0500, Bob Pearson wrote: > >> There are several race conditions discovered in the current rdma_rxe > >> driver. They mostly relate to races between normal operations and > >> destroying objects. This patch series > >> - Makes several minor cleanups in rxe_pool.[ch] > >> - Adds wait for completions to the paths in verbs APIs which destroy > >> objects. > >> - Changes read side locking to rcu. > >> - Moves object cleanup code to after ref count is zero > > > > This all seems fine to me now, except for the question about the > > tasklets > > > > Thanks, > > Jason > > There has been a long delay because of the mr = NULL bug and the locking > problems. With the following patches applied (last to first) I do not > see any lockdep warnings, seg faults or anything else in dmesg for > long runs of > > pyverbs > perftests (ib_xxx_bw, ib_xxx_lat) > rping (node to node) > blktests (srp) > > These patches were in v13 of the "Fix race conditions" patch. I will send v14 today. > 8d342cb8d7ce RDMA/rxe: Cleanup rxe_pool.c > > 6e4c52e04bc9 RDMA/rxe: Convert read side locking to rcu > > e3e46d864b98 RDMA/rxe: Stop lookup of partially built objects > > e1fb6b7225d0 RDMA/rxe: Enforce IBA C11-17 > > 2607d042376f RDMA/rxe: Move mw cleanup code to rxe_mw_cleanup() > > ca082913b915 RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup() > > 394f24ebc81b RDMA/rxe: Move qp cleanup code to rxe_qp_do_cleanup() > > > 3fb445b66e5c RDMA/rxe: Add rxe_srq_cleanup() > > 4730b0ed751a RDMA/rxe: Remove IB_SRQ_INIT_MASK > > > These patches are already submitted > d02e7a7266cf RDMA/rxe: Fix "RDMA/rxe: Cleanup rxe_mcast.c" > > 569aba28f67c RDMA/rxe: Fix "Replace mr by rkey in responder resources(2)" > or whatever you called it. > 5e74a5ecfb53 RDMA/rxe: Fix "Replace mr by rkey in responder resources" > > 007493744865 RDMA/rxe: Fix typo: replace paylen by payload > > > This patch was submitted to scsi by Bart and addressed long timeouts that > were not rxe related (same issue also happens with siw) > cdd844a1ba45 Revert "scsi: scsi_debug: Address races following module load" > > If Zhu is not OK with this let know what bugs remain that need fixing. How do you get this conclusion that I am not OK with this? Zhu Yanjun > > Bob