On Sun, Oct 04, 2020 at 09:32:26AM -0300, Jason Gunthorpe wrote: > On Sun, Oct 04, 2020 at 09:48:18AM +0300, Leon Romanovsky wrote: > > On Fri, Oct 02, 2020 at 10:16:28AM -0300, Jason Gunthorpe wrote: > > > On Fri, Oct 02, 2020 at 03:57:20PM +0300, Leon Romanovsky wrote: > > > > On Fri, Oct 02, 2020 at 09:42:17AM -0300, Jason Gunthorpe wrote: > > > > > On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote: > > > > > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c > > > > > > index 12ebacf52958..1abcb01d362f 100644 > > > > > > +++ b/drivers/infiniband/core/cq.c > > > > > > @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe, > > > > > > goto out_destroy_cq; > > > > > > } > > > > > > > > > > > > - rdma_restrack_add(&cq->res); > > > > > > + ret = rdma_restrack_add(&cq->res); > > > > > > + if (ret) > > > > > > + goto out_poll_cq; > > > > > > + > > > > > > trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx); > > > > > > return cq; > > > > > > > > > > > > +out_poll_cq: > > > > > > + switch (cq->poll_ctx) { > > > > > > + case IB_POLL_SOFTIRQ: > > > > > > + irq_poll_disable(&cq->iop); > > > > > > + break; > > > > > > + case IB_POLL_WORKQUEUE: > > > > > > + case IB_POLL_UNBOUND_WORKQUEUE: > > > > > > + cancel_work_sync(&cq->work); > > > > > > > > > > This error unwind is *technically* in the wrong order, it is wrong in > > > > > ib_free_cq too which is an actual bug. > > > > > > > > > > The cq->comp_handler should be set before calling create_cq and undone > > > > > after calling destroy_wq. We can do this right now that the > > > > > allocations have been reworked. > > > > > > > > > > Otherwise there is no assurance the ib_cq_completion_workqueue() won't > > > > > be called after this cancel == use after free > > > > > > > > > > Also, you need to check all the rdma_restrack_del()'s, they should > > > > > always be *before* destroying the HW object, eg ib_free_cq() has it > > > > > too late. Similarly the add should always be after the HW object is > > > > > allocated. > > > > > > > > It is true to not converted object (QP and MR), everything that was > > > > converted has two steps: rdma_restrack_put() before creation, > > > > rdma_restrack_add() right after creation and rdma_restrack_del() after > > > > successful destroy. > > > > > > It must be before destroy not after. > > > > We need rdma_restrack_put() after destroy to release memory. > > The netlink ops must be blocked before ops->destory and the memory > freed after ops->destroy success. > > It must work like that since the fill stuff was added as ops - no > choice. So I will need to separate _del() to two calls, one is real _del() and another _put(). Thanks > > Jason