Re: [PATCH for-next] RDMA/rxe: Fix potential race in rxe_pool_get_index

Jason Gunthorpe <jgg@xxxxxxxxxx> · Mon, 10 Jul 2023 13:47:49 -0300

On Fri, Jun 30, 2023 at 10:33:38AM -0500, Bob Pearson wrote:
> On 6/29/23 18:18, Jason Gunthorpe wrote:
> > On Thu, Jun 29, 2023 at 05:30:24PM -0500, Bob Pearson wrote:
> >> Currently the lookup of an object from its index and taking a
> >> reference to the object are incorrectly protected by an rcu_read_lock
> >> but this does not make the xa_load and the kref_get combination an
> >> atomic operation.
> >>
> >> The various write operations need to share the xarray state in a
> >> mixture of user, soft irq and hard irq contexts so the xa_locking
> >> must support that.
> >>
> >> This patch replaces the xa locks with xa_lock_irqsave.
> >>
> >> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays")
> >> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
> >> ---
> >>  drivers/infiniband/sw/rxe/rxe_pool.c | 24 ++++++++++++++++++------
> >>  1 file changed, 18 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> index 6215c6de3a84..f2b586249793 100644
> >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> @@ -119,8 +119,10 @@ void rxe_pool_cleanup(struct rxe_pool *pool)
> >>  int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> >>  				bool sleepable)
> >>  {
> >> -	int err;
> >> +	struct xarray *xa = &pool->xa;
> >> +	unsigned long flags;
> >>  	gfp_t gfp_flags;
> >> +	int err;
> >>  
> >>  	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
> >>  		goto err_cnt;
> >> @@ -138,8 +140,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> >>  
> >>  	if (sleepable)
> >>  		might_sleep();
> >> -	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
> >> +	xa_lock_irqsave(xa, flags);
> >> +	err = __xa_alloc_cyclic(xa, &elem->index, NULL, pool->limit,
> >>  			      &pool->next, gfp_flags);
> >> +	xa_unlock_irqrestore(xa, flags);
> > 
> > This stuff doesn't make any sense, the non __ versions already take
> > the xa_lock internally.
> > 
> > Or is this because you need the save/restore version for some reason?
> > But that seems unrelated and there should be a lockdep oops to go
> > along with it showing the backtrace??
> 
> The background here is that we are testing a 256 node system with
> the Lustre file system and doing fail-over-fail-back testing which
> is very high stress. This has uncovered several bugs where this is
> just one.

> The logic is 1st need to lock the lookup in rxe_pool_get_index()
> then when we tried to run with ordinary spin_locks we got lots of
> deadlocks. These are related to taking spin locks while in (soft
> irq) interrupt mode. In theory we could also get called in hard irq
> mode so might as well convert the locks to spin_lock_irqsave() which
> is safe in all cases.

That should be its own patch with justification..

> >> @@ -154,15 +158,16 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
> >>  {
> >>  	struct rxe_pool_elem *elem;
> >>  	struct xarray *xa = &pool->xa;
> >> +	unsigned long flags;
> >>  	void *obj;
> >>  
> >> -	rcu_read_lock();
> >> +	xa_lock_irqsave(xa, flags);
> >>  	elem = xa_load(xa, index);
> >>  	if (elem && kref_get_unless_zero(&elem->ref_cnt))
> >>  		obj = elem->obj;
> >>  	else
> >>  		obj = NULL;
> >> -	rcu_read_unlock();
> >> +	xa_unlock_irqrestore(xa, flags);
> > 
> > And this should be safe as long as the object is freed via RCU, so
> > what are you trying to fix?
> 
> The problem here is that rcu_read_lock only helps us if the object is freed with kfree_rcu.
> But we have no control over what rdma-core does and it does *not* do
> that for e.g. qp's.

Oh, yes that does sound right. This is another patch with this
explanation.

Jason