On Fri, Sep 13, 2024 at 10:00 AM Zhu Yanjun <yanjun.zhu@xxxxxxxxx> wrote: > > 在 2024/9/12 2:29, Selvin Xavier 写道: > > There is a race between the CREQ tasklet and destroy > > qp when accessing the qp-handle table. There is > > a chance of reading a valid qp-handle in the > > CREQ tasklet handler while the QP is already moving > > ahead with the destruction. > > > > Fixing this race by implementing a table-lock to > > synchronize the access. > > > > Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") > > Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") > > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@xxxxxxxxxxxx> > > Signed-off-by: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx> > > --- > > drivers/infiniband/hw/bnxt_re/qplib_fp.c | 5 +++++ > > drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 12 ++++++++---- > > drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 ++ > > 3 files changed, 15 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c > > index 42e98e5..5d36216 100644 > > --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c > > +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c > > @@ -1524,12 +1524,15 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, > > struct creq_destroy_qp_resp resp = {}; > > struct bnxt_qplib_cmdqmsg msg = {}; > > struct cmdq_destroy_qp req = {}; > > + unsigned long flags; > > u32 tbl_indx; > > int rc; > > > > + spin_lock_irqsave(&rcfw->tbl_lock, flags); > > Because the race occurs between tasklets, spin_lock_bh is enough to this? You are right. The race occurs in the BH handler. bnxt_qplib_service_creq , which is scheduled in the tasklet context, is already using the spin_lock_irqsave for synchronizing this function with other contexts. The current code change is with-in this context. I added this change on top of the current code and used irqsave. But I will fix both the spin lock usage and post a v2. Thanks for the review. > > Zhu Yanjun > > > tbl_indx = map_qp_id_to_tbl_indx(qp->id, rcfw); > > rcfw->qp_tbl[tbl_indx].qp_id = BNXT_QPLIB_QP_ID_INVALID; > > rcfw->qp_tbl[tbl_indx].qp_handle = NULL; > > + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); > > > > bnxt_qplib_rcfw_cmd_prep((struct cmdq_base *)&req, > > CMDQ_BASE_OPCODE_DESTROY_QP, > > @@ -1540,8 +1543,10 @@ int bnxt_qplib_destroy_qp(struct bnxt_qplib_res *res, > > sizeof(resp), 0); > > rc = bnxt_qplib_rcfw_send_message(rcfw, &msg); > > if (rc) { > > + spin_lock_irqsave(&rcfw->tbl_lock, flags); > > rcfw->qp_tbl[tbl_indx].qp_id = qp->id; > > rcfw->qp_tbl[tbl_indx].qp_handle = qp; > > + spin_unlock_irqrestore(&rcfw->tbl_lock, flags); > > return rc; > > } > > > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > > index 3ffaef0c..993c356 100644 > > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c > > @@ -637,17 +637,21 @@ static int bnxt_qplib_process_qp_event(struct bnxt_qplib_rcfw *rcfw, > > case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION: > > err_event = (struct creq_qp_error_notification *)qp_event; > > qp_id = le32_to_cpu(err_event->xid); > > + spin_lock(&rcfw->tbl_lock); > > tbl_indx = map_qp_id_to_tbl_indx(qp_id, rcfw); > > qp = rcfw->qp_tbl[tbl_indx].qp_handle; > > + if (!qp) { > > + spin_unlock(&rcfw->tbl_lock); > > + break; > > + } > > + bnxt_qplib_mark_qp_error(qp); > > + rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); > > + spin_unlock(&rcfw->tbl_lock); > > dev_dbg(&pdev->dev, "Received QP error notification\n"); > > dev_dbg(&pdev->dev, > > "qpid 0x%x, req_err=0x%x, resp_err=0x%x\n", > > qp_id, err_event->req_err_state_reason, > > err_event->res_err_state_reason); > > - if (!qp) > > - break; > > - bnxt_qplib_mark_qp_error(qp); > > - rc = rcfw->creq.aeq_handler(rcfw, qp_event, qp); > > break; > > default: > > /* > > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > > index 45996e6..07779ae 100644 > > --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > > +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h > > @@ -224,6 +224,8 @@ struct bnxt_qplib_rcfw { > > struct bnxt_qplib_crsqe *crsqe_tbl; > > int qp_tbl_size; > > struct bnxt_qplib_qp_node *qp_tbl; > > + /* To synchronize the qp-handle hash table */ > > + spinlock_t tbl_lock; > > u64 oos_prev; > > u32 init_oos_stats; > > u32 cmdq_depth; >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature