> From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Friday, October 4, 2019 8:28 PM > > On Fri, Oct 04, 2019 at 05:10:20PM +0000, Michal Kalderon wrote: > > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > > > owner@xxxxxxxxxxxxxxx> On Behalf Of Jason Gunthorpe > > > > > > On Thu, Oct 03, 2019 at 07:33:00PM +0000, Michal Kalderon wrote: > > > > > From: Jason Gunthorpe <jgg@xxxxxxxx> > > > > > Sent: Thursday, October 3, 2019 7:17 PM On Thu, Oct 03, 2019 at > > > > > 03:03:41PM +0300, Michal Kalderon wrote: > > > > > > > > > > > diff --git a/drivers/infiniband/hw/qedr/qedr_iw_cm.c > > > > > > b/drivers/infiniband/hw/qedr/qedr_iw_cm.c > > > > > > index 22881d4442b9..ebc6bc25a0e2 100644 > > > > > > +++ b/drivers/infiniband/hw/qedr/qedr_iw_cm.c > > > > > > @@ -79,6 +79,28 @@ qedr_fill_sockaddr6(const struct > > > > > > qed_iwarp_cm_info > > > > > *cm_info, > > > > > > } > > > > > > } > > > > > > > > > > > > +static void qedr_iw_free_qp(struct kref *ref) { > > > > > > + struct qedr_qp *qp = container_of(ref, struct qedr_qp, > > > > > > +refcnt); > > > > > > + > > > > > > + xa_erase_irq(&qp->dev->qps, qp->qp_id); > > > > > > > > > > why is it _irq? Where are we in an irq when using the xa_lock on > > > > > this > > > xarray? > > > > We could be under a spin lock when called from several locations > > > > in core/iwcm.c > > > > > > spinlock is OK, _irq is only needed if the code needs to mask IRQs > > > because there is a user of the same lock in an IRQ context, see the > documentation. > > > > > > > > > @@ -516,8 +548,10 @@ int qedr_iw_connect(struct iw_cm_id > > > > > > *cm_id, > > > > > struct iw_cm_conn_param *conn_param) > > > > > > return -ENOMEM; > > > > > > > > > > > > ep->dev = dev; > > > > > > + kref_init(&ep->refcnt); > > > > > > + > > > > > > + kref_get(&qp->refcnt); > > > > > > > > > > Here 'qp' comes out of an xa_load, but the QP is still visible > > > > > in the xarray with a 0 refcount, so this is invalid. > > > > > > > The core/iwcm takes a refcnt of the QP before calling connect, so > > > > it can't be with refcnt zero > > > > > > > > Also, the xa_load doesn't have any locking around it, so the > > > > > entire thing looks wrong to me. > > > > Since the functions calling it from core/iwcm ( connect / accept ) > > > > take a qp Ref-cnt before the calling there's no risk of the entry > > > > being deleted while xa_load is called > > > > > > Then why look it up in an xarray at all? If you already have the > > > pointer to get a refcount then pass the refcounted pointer in and > > > get rid of the sketchy xarray lookup. > > > > > I don't have the pointer, the core/iwcm has the pointer. > > The interface between the core and driver is that the driver gets a qp > > number from the core/iwcm and needs to get the QP pointer from it's > > database. All the iWARP drivers are implemented this way, this is also not > new to qedr. > > That seems crazy. I can take an action item on looking into redesigning this together with the other iwarp vendors. For this series, that attempts to fix some leaks and concurrency issues in qedr , are there any more issues except the xa_erase_irq which you would want me to fix for v2? Thanks, Michal > Jason