> From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Thursday, October 3, 2019 7:17 PM > On Thu, Oct 03, 2019 at 03:03:41PM +0300, Michal Kalderon wrote: > > > diff --git a/drivers/infiniband/hw/qedr/qedr_iw_cm.c > > b/drivers/infiniband/hw/qedr/qedr_iw_cm.c > > index 22881d4442b9..ebc6bc25a0e2 100644 > > +++ b/drivers/infiniband/hw/qedr/qedr_iw_cm.c > > @@ -79,6 +79,28 @@ qedr_fill_sockaddr6(const struct qed_iwarp_cm_info > *cm_info, > > } > > } > > > > +static void qedr_iw_free_qp(struct kref *ref) { > > + struct qedr_qp *qp = container_of(ref, struct qedr_qp, refcnt); > > + > > + xa_erase_irq(&qp->dev->qps, qp->qp_id); > > why is it _irq? Where are we in an irq when using the xa_lock on this xarray? We could be under a spin lock when called from several locations in core/iwcm.c > > > > + kfree(qp); > > [..] > > > @@ -516,8 +548,10 @@ int qedr_iw_connect(struct iw_cm_id *cm_id, > struct iw_cm_conn_param *conn_param) > > return -ENOMEM; > > > > ep->dev = dev; > > + kref_init(&ep->refcnt); > > + > > + kref_get(&qp->refcnt); > > Here 'qp' comes out of an xa_load, but the QP is still visible in the xarray with > a 0 refcount, so this is invalid. The core/iwcm takes a refcnt of the QP before calling connect, so it can't be with refcnt zero > > Also, the xa_load doesn't have any locking around it, so the entire thing looks > wrong to me. Since the functions calling it from core/iwcm ( connect / accept ) take a qp Ref-cnt before the calling there's no risk of the entry being deleted while xa_load is called > > Most probably you want to hold the xa_lock during xa_load and then use a > kref_get_not_zero - failure of the get also means the qp is not in the xarray > > Or rework things so the qp is removed from the xarray earlier, I'm not sure. This would make qedr more robust, but I think it not needed given the existing Core/iwcm implementation. Thanks, Michal > > Jason