Re: [PATCH rdma-next v3 6/9] RDMA/restrack: Add error handling while adding restrack object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 04, 2020 at 03:49:20PM +0300, Leon Romanovsky wrote:
> On Sun, Oct 04, 2020 at 09:32:26AM -0300, Jason Gunthorpe wrote:
> > On Sun, Oct 04, 2020 at 09:48:18AM +0300, Leon Romanovsky wrote:
> > > On Fri, Oct 02, 2020 at 10:16:28AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Oct 02, 2020 at 03:57:20PM +0300, Leon Romanovsky wrote:
> > > > > On Fri, Oct 02, 2020 at 09:42:17AM -0300, Jason Gunthorpe wrote:
> > > > > > On Sat, Sep 26, 2020 at 01:19:35PM +0300, Leon Romanovsky wrote:
> > > > > > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> > > > > > > index 12ebacf52958..1abcb01d362f 100644
> > > > > > > +++ b/drivers/infiniband/core/cq.c
> > > > > > > @@ -267,10 +267,25 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
> > > > > > >  		goto out_destroy_cq;
> > > > > > >  	}
> > > > > > >
> > > > > > > -	rdma_restrack_add(&cq->res);
> > > > > > > +	ret = rdma_restrack_add(&cq->res);
> > > > > > > +	if (ret)
> > > > > > > +		goto out_poll_cq;
> > > > > > > +
> > > > > > >  	trace_cq_alloc(cq, nr_cqe, comp_vector, poll_ctx);
> > > > > > >  	return cq;
> > > > > > >
> > > > > > > +out_poll_cq:
> > > > > > > +	switch (cq->poll_ctx) {
> > > > > > > +	case IB_POLL_SOFTIRQ:
> > > > > > > +		irq_poll_disable(&cq->iop);
> > > > > > > +		break;
> > > > > > > +	case IB_POLL_WORKQUEUE:
> > > > > > > +	case IB_POLL_UNBOUND_WORKQUEUE:
> > > > > > > +		cancel_work_sync(&cq->work);
> > > > > >
> > > > > > This error unwind is *technically* in the wrong order, it is wrong in
> > > > > > ib_free_cq too which is an actual bug.
> > > > > >
> > > > > > The cq->comp_handler should be set before calling create_cq and undone
> > > > > > after calling destroy_wq. We can do this right now that the
> > > > > > allocations have been reworked.
> > > > > >
> > > > > > Otherwise there is no assurance the ib_cq_completion_workqueue() won't
> > > > > > be called after this cancel == use after free
> > > > > >
> > > > > > Also, you need to check all the rdma_restrack_del()'s, they should
> > > > > > always be *before* destroying the HW object, eg ib_free_cq() has it
> > > > > > too late. Similarly the add should always be after the HW object is
> > > > > > allocated.
> > > > >
> > > > > It is true to not converted object (QP and MR), everything that was
> > > > > converted has two steps: rdma_restrack_put() before creation,
> > > > > rdma_restrack_add() right after creation and rdma_restrack_del() after
> > > > > successful destroy.
> > > >
> > > > It must be before destroy not after.
> > >
> > > We need rdma_restrack_put() after destroy to release memory.
> >
> > The netlink ops must be blocked before ops->destory and the memory
> > freed after ops->destroy success.
> >
> > It must work like that since the fill stuff was added as ops - no
> > choice.
> 
> So I will need to separate _del() to two calls, one is real _del() and
> another _put().

I think you end up with destroy being like

restrack_remove_from_xarray_and_stop_nl()
rc = ops->destroy()
if (rc)
   restrack_return_to_xarray()
   return rc

restrack_put_for_freeing_memory()

It ends up with *two* refcounts, an kref for the memory lifetime and a
refcount_t for the HW object lifetime (basically HW object destroy
rwlock)

To solve the immediate problems I'd suggest something like

static inline int __rdma_destroy_hw_obj(struct restrack *res, int destroy_rc);

#define rdma_destroy_hw_obj(restrack, op, args ..) \
  ({ __rdma_destroy_hw_ob_pre(restrack); __rdma_destroy_hw_obj(restrack, op(args, ## __VA_ARGS__));})

Which does the sequencing above

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux