On Fri, Jun 30, 2023 at 10:33:38AM -0500, Bob Pearson wrote: > On 6/29/23 18:18, Jason Gunthorpe wrote: > > On Thu, Jun 29, 2023 at 05:30:24PM -0500, Bob Pearson wrote: > >> Currently the lookup of an object from its index and taking a > >> reference to the object are incorrectly protected by an rcu_read_lock > >> but this does not make the xa_load and the kref_get combination an > >> atomic operation. > >> > >> The various write operations need to share the xarray state in a > >> mixture of user, soft irq and hard irq contexts so the xa_locking > >> must support that. > >> > >> This patch replaces the xa locks with xa_lock_irqsave. > >> > >> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays") > >> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> > >> --- > >> drivers/infiniband/sw/rxe/rxe_pool.c | 24 ++++++++++++++++++------ > >> 1 file changed, 18 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > >> index 6215c6de3a84..f2b586249793 100644 > >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c > >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > >> @@ -119,8 +119,10 @@ void rxe_pool_cleanup(struct rxe_pool *pool) > >> int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem, > >> bool sleepable) > >> { > >> - int err; > >> + struct xarray *xa = &pool->xa; > >> + unsigned long flags; > >> gfp_t gfp_flags; > >> + int err; > >> > >> if (atomic_inc_return(&pool->num_elem) > pool->max_elem) > >> goto err_cnt; > >> @@ -138,8 +140,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem, > >> > >> if (sleepable) > >> might_sleep(); > >> - err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit, > >> + xa_lock_irqsave(xa, flags); > >> + err = __xa_alloc_cyclic(xa, &elem->index, NULL, pool->limit, > >> &pool->next, gfp_flags); > >> + xa_unlock_irqrestore(xa, flags); > > > > This stuff doesn't make any sense, the non __ versions already take > > the xa_lock internally. > > > > Or is this because you need the save/restore version for some reason? > > But that seems unrelated and there should be a lockdep oops to go > > along with it showing the backtrace?? > > The background here is that we are testing a 256 node system with > the Lustre file system and doing fail-over-fail-back testing which > is very high stress. This has uncovered several bugs where this is > just one. > The logic is 1st need to lock the lookup in rxe_pool_get_index() > then when we tried to run with ordinary spin_locks we got lots of > deadlocks. These are related to taking spin locks while in (soft > irq) interrupt mode. In theory we could also get called in hard irq > mode so might as well convert the locks to spin_lock_irqsave() which > is safe in all cases. That should be its own patch with justification.. > >> @@ -154,15 +158,16 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index) > >> { > >> struct rxe_pool_elem *elem; > >> struct xarray *xa = &pool->xa; > >> + unsigned long flags; > >> void *obj; > >> > >> - rcu_read_lock(); > >> + xa_lock_irqsave(xa, flags); > >> elem = xa_load(xa, index); > >> if (elem && kref_get_unless_zero(&elem->ref_cnt)) > >> obj = elem->obj; > >> else > >> obj = NULL; > >> - rcu_read_unlock(); > >> + xa_unlock_irqrestore(xa, flags); > > > > And this should be safe as long as the object is freed via RCU, so > > what are you trying to fix? > > The problem here is that rcu_read_lock only helps us if the object is freed with kfree_rcu. > But we have no control over what rdma-core does and it does *not* do > that for e.g. qp's. Oh, yes that does sound right. This is another patch with this explanation. Jason