Re: [PATCH for-next v13 00/10] Fix race conditions in rxe_pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/8/22 13:06, Jason Gunthorpe wrote:
> On Mon, Apr 04, 2022 at 04:50:50PM -0500, Bob Pearson wrote:
>> There are several race conditions discovered in the current rdma_rxe
>> driver.  They mostly relate to races between normal operations and
>> destroying objects.  This patch series
>>  - Makes several minor cleanups in rxe_pool.[ch]
>>  - Adds wait for completions to the paths in verbs APIs which destroy
>>    objects.
>>  - Changes read side locking to rcu.
>>  - Moves object cleanup code to after ref count is zero
> 
> This all seems fine to me now, except for the question about the
> tasklets
> 
> Thanks,
> Jason

There has been a long delay because of the mr = NULL bug and the locking
problems. With the following patches applied (last to first) I do not
see any lockdep warnings, seg faults or anything else in dmesg for
long runs of

	pyverbs
	perftests (ib_xxx_bw, ib_xxx_lat)
	rping (node to node)
	blktests (srp)

These patches were in v13 of the "Fix race conditions" patch. I will send v14 today.
8d342cb8d7ce RDMA/rxe: Cleanup rxe_pool.c

6e4c52e04bc9 RDMA/rxe: Convert read side locking to rcu

e3e46d864b98 RDMA/rxe: Stop lookup of partially built objects

e1fb6b7225d0 RDMA/rxe: Enforce IBA C11-17

2607d042376f RDMA/rxe: Move mw cleanup code to rxe_mw_cleanup()

ca082913b915 RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()

394f24ebc81b RDMA/rxe: Move qp cleanup code to rxe_qp_do_cleanup()


3fb445b66e5c RDMA/rxe: Add rxe_srq_cleanup()

4730b0ed751a RDMA/rxe: Remove IB_SRQ_INIT_MASK


These patches are already submitted
d02e7a7266cf RDMA/rxe: Fix "RDMA/rxe: Cleanup rxe_mcast.c"

569aba28f67c RDMA/rxe: Fix "Replace mr by rkey in responder resources(2)"
 or whatever you called it.
5e74a5ecfb53 RDMA/rxe: Fix "Replace mr by rkey in responder resources"

007493744865 RDMA/rxe: Fix typo: replace paylen by payload


This patch was submitted to scsi by Bart and addressed long timeouts that
were not rxe related (same issue also happens with siw)
cdd844a1ba45 Revert "scsi: scsi_debug: Address races following module load"

If Zhu is not OK with this let know what bugs remain that need fixing.

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux