On Tue, Aug 25, 2020 at 03:12:07PM +0300, Gal Pressman wrote: > On 25/08/2020 14:52, Jason Gunthorpe wrote: > > On Tue, Aug 25, 2020 at 11:13:25AM +0300, Gal Pressman wrote: > >> On 24/08/2020 13:32, Leon Romanovsky wrote: > >>> diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h > >>> index 1889dd172a25..8547f9d543df 100644 > >>> +++ b/drivers/infiniband/hw/efa/efa.h > >>> @@ -134,7 +134,7 @@ int efa_query_gid(struct ib_device *ibdev, u8 port, int index, > >>> int efa_query_pkey(struct ib_device *ibdev, u8 port, u16 index, > >>> u16 *pkey); > >>> int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata); > >>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata); > >>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata); > >>> int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata); > >>> struct ib_qp *efa_create_qp(struct ib_pd *ibpd, > >>> struct ib_qp_init_attr *init_attr, > >>> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c > >>> index 3f7f19b9f463..660a69943e02 100644 > >>> +++ b/drivers/infiniband/hw/efa/efa_verbs.c > >>> @@ -383,13 +383,14 @@ int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) > >>> return err; > >>> } > >>> > >>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) > >>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) > >>> { > >>> struct efa_dev *dev = to_edev(ibpd->device); > >>> struct efa_pd *pd = to_epd(ibpd); > >>> > >>> ibdev_dbg(&dev->ibdev, "Dealloc pd[%d]\n", pd->pdn); > >>> efa_pd_dealloc(dev, pd->pdn); > >>> + return 0; > >>> } > >> > >> Nice change, thanks Leon. > >> At least for EFA, I prefer to return the return value of the destroy command > >> instead of silently ignoring it (same for the other patches). > > > > Drivers can't fail the destroy unless a future destroy will succeed. > > it breaks everything to do that. > > What does it break? It will break ucontext cleanup flow exactly where you pointed, e.g. uverbs_destroy_ufile_hw(). We release all objects and memory associated with that ucontext to ensure that no memory leaks. > > > That is why the return codes were removed. > > > > Unfortunately there are cases where future destroy will succeed, so > > here we are :( > > I replied in the other thread as well, we can continue the discussion here. > > I agree that drivers shouldn't fail destroy commands, but you know.. bugs/errors > happen (especially when dealing with hardware), and we have a way to propagate > them, why do it for only some of the drivers? Because DevX flow guarantees that this second call will succeed. > > These kinds of things aren't really expected so I don't mind what we end up > with, but in the rare cases they do I would prefer not to ignore such errors, > and yea, give the destroy command another shot when the process is cleaned up. > > > If the chip fails a destroy when it should not then it has failed and > > should be disabled at PCI and reset, continuing to free anyhow. > > How do we reset the device when there are active apps using it? I think that it is not different than any PCI error handling. Thanks