On 25/08/2020 14:52, Jason Gunthorpe wrote: > On Tue, Aug 25, 2020 at 11:13:25AM +0300, Gal Pressman wrote: >> On 24/08/2020 13:32, Leon Romanovsky wrote: >>> diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h >>> index 1889dd172a25..8547f9d543df 100644 >>> +++ b/drivers/infiniband/hw/efa/efa.h >>> @@ -134,7 +134,7 @@ int efa_query_gid(struct ib_device *ibdev, u8 port, int index, >>> int efa_query_pkey(struct ib_device *ibdev, u8 port, u16 index, >>> u16 *pkey); >>> int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata); >>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata); >>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata); >>> int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata); >>> struct ib_qp *efa_create_qp(struct ib_pd *ibpd, >>> struct ib_qp_init_attr *init_attr, >>> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c >>> index 3f7f19b9f463..660a69943e02 100644 >>> +++ b/drivers/infiniband/hw/efa/efa_verbs.c >>> @@ -383,13 +383,14 @@ int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) >>> return err; >>> } >>> >>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) >>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) >>> { >>> struct efa_dev *dev = to_edev(ibpd->device); >>> struct efa_pd *pd = to_epd(ibpd); >>> >>> ibdev_dbg(&dev->ibdev, "Dealloc pd[%d]\n", pd->pdn); >>> efa_pd_dealloc(dev, pd->pdn); >>> + return 0; >>> } >> >> Nice change, thanks Leon. >> At least for EFA, I prefer to return the return value of the destroy command >> instead of silently ignoring it (same for the other patches). > > Drivers can't fail the destroy unless a future destroy will succeed. > it breaks everything to do that. What does it break? > That is why the return codes were removed. > > Unfortunately there are cases where future destroy will succeed, so > here we are :( I replied in the other thread as well, we can continue the discussion here. I agree that drivers shouldn't fail destroy commands, but you know.. bugs/errors happen (especially when dealing with hardware), and we have a way to propagate them, why do it for only some of the drivers? These kinds of things aren't really expected so I don't mind what we end up with, but in the rare cases they do I would prefer not to ignore such errors, and yea, give the destroy command another shot when the process is cleaned up. > If the chip fails a destroy when it should not then it has failed and > should be disabled at PCI and reset, continuing to free anyhow. How do we reset the device when there are active apps using it?