On Sun, Aug 30, 2020 at 11:40:05AM +0300, Leon Romanovsky wrote: > -void mlx5_ib_destroy_srq(struct ib_srq *srq, struct ib_udata *udata) > +int mlx5_ib_destroy_srq(struct ib_srq *srq, struct ib_udata *udata) > { > struct mlx5_ib_dev *dev = to_mdev(srq->device); > struct mlx5_ib_srq *msrq = to_msrq(srq); > + int ret; > + > + ret = mlx5_cmd_destroy_srq(dev, &msrq->msrq); > + if (ret && udata) > + return ret; > > - mlx5_cmd_destroy_srq(dev, &msrq->msrq); > - > - if (srq->uobject) { > - mlx5_ib_db_unmap_user( > - rdma_udata_to_drv_context( > - udata, > - struct mlx5_ib_ucontext, > - ibucontext), > - &msrq->db); > - ib_umem_release(msrq->umem); > - } else { > - destroy_srq_kernel(dev, msrq); > + if (udata) { > + destroy_srq_user(srq->pd, msrq, udata); > + return 0; > } > + > + /* We are cleaning kernel resources anyway */ > + destroy_srq_kernel(dev, msrq); Oh, and this isn't right.. If we are going to leak things then we have to leak anything exposed for DMA as well, eg the fragbuf under the SRQ has to be leaked. If the HW can't guarentee it stopped doing DMA then we can't return memory under potentially active DMA back to the system. IHMO mlx5, and all the other drivers, get this wrong. Failing to eventually destroy an object is a catastrophic failure of the device. In the case of a kernel object it must always be destroyed on the first attempt. In this case the device should be killed. Disable memory access at the PCI config space, trigger a device reset, disassociate the device, and allow all destroy commands to fake-succeed. Since drivers need help to get this right, I'm wonder if we should fix this at the core level by introducing a 'your device is screwed up, kill it' callback. Then all the destroys can return failures as Gal wanted. The core logic would be something like ret = dev->ops.destroy_foo() if (is_kernel_object()) dev->ops.device_is_broken() ret = dev->ops.destroy_foo() WARN_ON(ret); Ie after 'device_is_broken' the driver must always succeed future destroys. Then we have a chance to make this work properly... mlx5 at least already has an implementation of 'device_is_broken' that does trigger success for future destroys. Jason