On Mon, Sep 07, 2020 at 03:09:15PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > The HW release can fail and leave the system in limbo state, > where SRQ is removed from the table, but can't be destroyed later. > In every reentry, the initial xa_erase_irq() check will fail. > > Rewrite the erase logic to keep index, but don't store the entry > itself. By doing it, we can safely reinsert entry back in the case > of destroy failure. > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > drivers/infiniband/hw/mlx5/srq_cmd.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c > index 37aaacebd3f2..c6b32b15c3f2 100644 > +++ b/drivers/infiniband/hw/mlx5/srq_cmd.c > @@ -596,13 +596,22 @@ void mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq) > struct mlx5_core_srq *tmp; > int err; > > - tmp = xa_erase_irq(&table->array, srq->srqn); > - if (!tmp || tmp != srq) > + /* Delete entry, but leave index occupied */ > + tmp = xa_cmpxchg_irq(&table->array, srq->srqn, srq, XA_ZERO_ENTRY, 0); > + if (WARN_ON(!tmp || tmp != srq) || xa_err(tmp)) This is just WARN_ON(xa_err(tmp)) xa_cmpxchg will fail if tmp != srq and srq != NULL or we already crashed Jason