RE: [EXT] Re: [PATCH rdma-next 1/2] RDMA/qedr: Fix synchronization methods and memory leaks in qedr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Jason Gunthorpe <jgg@xxxxxxxx>
> Sent: Friday, October 4, 2019 8:28 PM
> 
> On Fri, Oct 04, 2019 at 05:10:20PM +0000, Michal Kalderon wrote:
> > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
> > > owner@xxxxxxxxxxxxxxx> On Behalf Of Jason Gunthorpe
> > >
> > > On Thu, Oct 03, 2019 at 07:33:00PM +0000, Michal Kalderon wrote:
> > > > > From: Jason Gunthorpe <jgg@xxxxxxxx>
> > > > > Sent: Thursday, October 3, 2019 7:17 PM On Thu, Oct 03, 2019 at
> > > > > 03:03:41PM +0300, Michal Kalderon wrote:
> > > > >
> > > > > > diff --git a/drivers/infiniband/hw/qedr/qedr_iw_cm.c
> > > > > > b/drivers/infiniband/hw/qedr/qedr_iw_cm.c
> > > > > > index 22881d4442b9..ebc6bc25a0e2 100644
> > > > > > +++ b/drivers/infiniband/hw/qedr/qedr_iw_cm.c
> > > > > > @@ -79,6 +79,28 @@ qedr_fill_sockaddr6(const struct
> > > > > > qed_iwarp_cm_info
> > > > > *cm_info,
> > > > > >  	}
> > > > > >  }
> > > > > >
> > > > > > +static void qedr_iw_free_qp(struct kref *ref) {
> > > > > > +	struct qedr_qp *qp = container_of(ref, struct qedr_qp,
> > > > > > +refcnt);
> > > > > > +
> > > > > > +	xa_erase_irq(&qp->dev->qps, qp->qp_id);
> > > > >
> > > > > why is it _irq? Where are we in an irq when using the xa_lock on
> > > > > this
> > > xarray?
> > > > We could be under a spin lock when called from several locations
> > > > in core/iwcm.c
> > >
> > > spinlock is OK, _irq is only needed if the code needs to mask IRQs
> > > because there is a user of the same lock in an IRQ context, see the
> documentation.
> > >
> > > > > > @@ -516,8 +548,10 @@ int qedr_iw_connect(struct iw_cm_id
> > > > > > *cm_id,
> > > > > struct iw_cm_conn_param *conn_param)
> > > > > >  		return -ENOMEM;
> > > > > >
> > > > > >  	ep->dev = dev;
> > > > > > +	kref_init(&ep->refcnt);
> > > > > > +
> > > > > > +	kref_get(&qp->refcnt);
> > > > >
> > > > > Here 'qp' comes out of an xa_load, but the QP is still visible
> > > > > in the xarray with a 0 refcount, so this is invalid.
> > >
> > > > The core/iwcm takes a refcnt of the QP before calling connect, so
> > > > it can't be with refcnt zero
> > >
> > > > > Also, the xa_load doesn't have any locking around it, so the
> > > > > entire thing looks wrong to me.
> > > > Since the functions calling it from core/iwcm ( connect / accept )
> > > > take a qp Ref-cnt before the calling there's no risk of the entry
> > > > being deleted while xa_load is called
> > >
> > > Then why look it up in an xarray at all? If you already have the
> > > pointer to get a refcount then pass the refcounted pointer in and
> > > get rid of the sketchy xarray lookup.
> > >
> > I don't have the pointer, the core/iwcm has the pointer.
> > The interface between the core and driver is that the driver gets a qp
> > number from the core/iwcm and needs to get the QP pointer from it's
> > database. All the iWARP drivers are implemented this way, this is also not
> new to qedr.
> 
> That seems crazy.

I can take an action item on looking into redesigning this together with the other iwarp vendors.
For this series, that attempts to fix some leaks and concurrency issues in qedr ,
are there any more issues except the  xa_erase_irq which you would want me to fix for v2?

Thanks,
Michal 

> Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux