Re: [PATCH for-next 0/6] RDMA/rxe: Fix potential races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/12/21 1:34 AM, Leon Romanovsky wrote:
> On Sun, Oct 10, 2021 at 06:59:25PM -0500, Bob Pearson wrote:
>> There are possible race conditions related to attempting to access
>> rxe pool objects at the same time as the pools or elements are being
>> freed. This series of patches addresses these races.
> 
> Can we get rid of this pool?
> 
> Thanks
> 
>>
>> Bob Pearson (6):
>>   RDMA/rxe: Make rxe_alloc() take pool lock
>>   RDMA/rxe: Copy setup parameters into rxe_pool
>>   RDMA/rxe: Save object pointer in pool element
>>   RDMA/rxe: Combine rxe_add_index with rxe_alloc
>>   RDMA/rxe: Combine rxe_add_key with rxe_alloc
>>   RDMA/rxe: Fix potential race condition in rxe_pool
>>
>>  drivers/infiniband/sw/rxe/rxe_mcast.c |   5 +-
>>  drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
>>  drivers/infiniband/sw/rxe/rxe_mw.c    |   5 +-
>>  drivers/infiniband/sw/rxe/rxe_pool.c  | 235 +++++++++++++-------------
>>  drivers/infiniband/sw/rxe/rxe_pool.h  |  67 +++-----
>>  drivers/infiniband/sw/rxe/rxe_verbs.c |  10 --
>>  6 files changed, 140 insertions(+), 183 deletions(-)
>>
>> -- 
>> 2.30.2
>>

Not sure which 'this' you mean? This set of patches is motivated by someone at HPE
running into seg faults caused very infrequently by rdma packets causing seg faults
when trying to copy data to or from an MR. This can only happen (other than just dumb
bug which doesn't seem to be the case) by a late packet arriving after the MR is
de-registered. The root cause of that is the way rxe currently defers cleaning up
objects with krefs and potential races between cleanup and new packets looking up
rkeys. I found a lot of potential race conditions and tried to close them off. There
are another couple of patches coming as well.

This is an attempt to fix up the code the way it is now. Later I would like to use
xarrays to handle rkey indices and qpns etc which looks cleaner.

Pools is mostly a misnomer since you moved all the allocates into rdma-core except for
a couple. Really they are a way to add indices or keys to the objects that are already
there.

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux