Re: [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD deallocate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 25, 2020 at 03:12:07PM +0300, Gal Pressman wrote:
> On 25/08/2020 14:52, Jason Gunthorpe wrote:
> > On Tue, Aug 25, 2020 at 11:13:25AM +0300, Gal Pressman wrote:
> >> On 24/08/2020 13:32, Leon Romanovsky wrote:
> >>> diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
> >>> index 1889dd172a25..8547f9d543df 100644
> >>> +++ b/drivers/infiniband/hw/efa/efa.h
> >>> @@ -134,7 +134,7 @@ int efa_query_gid(struct ib_device *ibdev, u8 port, int index,
> >>>  int efa_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
> >>>  		   u16 *pkey);
> >>>  int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
> >>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
> >>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
> >>>  int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata);
> >>>  struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
> >>>  			    struct ib_qp_init_attr *init_attr,
> >>> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
> >>> index 3f7f19b9f463..660a69943e02 100644
> >>> +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> >>> @@ -383,13 +383,14 @@ int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
> >>>  	return err;
> >>>  }
> >>>
> >>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
> >>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
> >>>  {
> >>>  	struct efa_dev *dev = to_edev(ibpd->device);
> >>>  	struct efa_pd *pd = to_epd(ibpd);
> >>>
> >>>  	ibdev_dbg(&dev->ibdev, "Dealloc pd[%d]\n", pd->pdn);
> >>>  	efa_pd_dealloc(dev, pd->pdn);
> >>> +	return 0;
> >>>  }
> >>
> >> Nice change, thanks Leon.
> >> At least for EFA, I prefer to return the return value of the destroy command
> >> instead of silently ignoring it (same for the other patches).
> >
> > Drivers can't fail the destroy unless a future destroy will succeed.
> > it breaks everything to do that.
>
> What does it break?

It will break ucontext cleanup flow exactly where you pointed, e.g.
uverbs_destroy_ufile_hw(). We release all objects and memory associated
with that ucontext to ensure that no memory leaks.

>
> > That is why the return codes were removed.
> >
> > Unfortunately there are cases where future destroy will succeed, so
> > here we are :(
>
> I replied in the other thread as well, we can continue the discussion here.
>
> I agree that drivers shouldn't fail destroy commands, but you know.. bugs/errors
> happen (especially when dealing with hardware), and we have a way to propagate
> them, why do it for only some of the drivers?

Because DevX flow guarantees that this second call will succeed.

>
> These kinds of things aren't really expected so I don't mind what we end up
> with, but in the rare cases they do I would prefer not to ignore such errors,
> and yea, give the destroy command another shot when the process is cleaned up.
>
> > If the chip fails a destroy when it should not then it has failed and
> > should be disabled at PCI and reset, continuing to free anyhow.
>
> How do we reset the device when there are active apps using it?

I think that it is not different than any PCI error handling.

Thanks



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux