Re: [PATCH rdma-next v1 03/10] RDMA/mlx5: Issue FW command to destroy SRQ on reentry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 02, 2020 at 09:31:15PM -0300, Jason Gunthorpe wrote:
> On Sun, Aug 30, 2020 at 11:40:03AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> >
> > The HW release can fail and leave the system in limbo state,
> > where SRQ is removed from the table, but can't be destroyed later.
> > In every reentry, the initial xa_erase_irq() check will fail.
> >
> > Rewrite the erase logic to keep index, but don't store the entry
> > itself. By doing it, we can safely reinsert entry back in the case
> > of destroy failure and be safe from any xa_store_irq() error.
> >
> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> >  drivers/infiniband/hw/mlx5/srq_cmd.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c
> > index 37aaacebd3f2..c6d807f04d9d 100644
> > +++ b/drivers/infiniband/hw/mlx5/srq_cmd.c
> > @@ -596,13 +596,22 @@ void mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq)
> >  	struct mlx5_core_srq *tmp;
> >  	int err;
> >
> > -	tmp = xa_erase_irq(&table->array, srq->srqn);
> > -	if (!tmp || tmp != srq)
> > +	/* Delete entry, but leave index occupied */
> > +	tmp = xa_store_irq(&table->array, srq->srqn, NULL, 0);
> > +	if (WARN_ON(!tmp || tmp != srq))
> >  		return;
>
> This isn't an allocating xarray:
>
> 	xa_init_flags(&table->array, XA_FLAGS_LOCK_IRQ);
>
> So storing NULL actually does delete the entry and clean up the memory
> and the store below could fail.
>
> I think this should be written as
>
>    xa_cmpxchg_irq(&table->array, srq->srqn, srq, XA_ZERO_ENTRY, 0);
>
> And the undo below would be
>
>    xa_cmpxchg_irq(&table->array, srq->srqn, XA_ZERO_ENTRY, srq 0);

Thanks, it is better.

>
> > +	xa_erase_irq(&table->array, srq->srqn);
>
> And this is racy since the FW could have reallocated the same srqn and
> already set it in the xarray.

Not in mlx5 devices, srqn is allocated in round-robin order.

>
> It needs to be xa_release_irq(), which looks like it needs to be
> added to match xa_reserve_irq()

We can't call to xa_lock_irq*() while issuing FW commands.

Thanks

>
> Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux