> -----Original Message----- > From: Leon Romanovsky <leon@xxxxxxxxxx> > Sent: Thursday, 6 June 2024 12:51 > To: Konstantin Taranov <kotaranov@xxxxxxxxxxxxx> > Cc: Konstantin Taranov <kotaranov@xxxxxxxxxxxxxxxxxxx>; Wei Hu > <weh@xxxxxxxxxxxxx>; sharmaajay@xxxxxxxxxxxxx; Long Li > <longli@xxxxxxxxxxxxx>; jgg@xxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx > Subject: [EXTERNAL] Re: [PATCH rdma-next 1/1] RDMA/mana_ib: process QP > error events > > On Thu, Jun 06, 2024 at 09:30:51AM +0000, Konstantin Taranov wrote: > > > > > > +static void > > > > > > +mana_ib_event_handler(void *ctx, struct gdma_queue *q, struct > > > > > > +gdma_event *event) { > > > > > > + struct mana_ib_dev *mdev = (struct mana_ib_dev *)ctx; > > > > > > + struct mana_ib_qp *qp; > > > > > > + struct ib_event ev; > > > > > > + unsigned long flag; > > > > > > + u32 qpn; > > > > > > + > > > > > > + switch (event->type) { > > > > > > + case GDMA_EQE_RNIC_QP_FATAL: > > > > > > + qpn = event->details[0]; > > > > > > + xa_lock_irqsave(&mdev->qp_table_rq, flag); > > > > > > + qp = xa_load(&mdev->qp_table_rq, qpn); > > > > > > + if (qp) > > > > > > + refcount_inc(&qp->refcount); > > > > > > + xa_unlock_irqrestore(&mdev->qp_table_rq, flag); > > > > > > + if (!qp) > > > > > > + break; > > > > > > + if (qp->ibqp.event_handler) { > > > > > > + ev.device = qp->ibqp.device; > > > > > > + ev.element.qp = &qp->ibqp; > > > > > > + ev.event = IB_EVENT_QP_FATAL; > > > > > > + qp->ibqp.event_handler(&ev, qp- > >ibqp.qp_context); > > > > > > + } > > > > > > + if (refcount_dec_and_test(&qp->refcount)) > > > > > > + complete(&qp->free); > > > > > > + break; > > > > > > + default: > > > > > > + break; > > > > > > + } > > > > > > +} > > > > > > > > > > <...> > > > > > > > > > > > @@ -620,6 +626,11 @@ static int mana_ib_destroy_rc_qp(struct > > > > > mana_ib_qp *qp, struct ib_udata *udata) > > > > > > container_of(qp->ibqp.device, struct mana_ib_dev, > ib_dev); > > > > > > int i; > > > > > > > > > > > > + xa_erase_irq(&mdev->qp_table_rq, qp->ibqp.qp_num); > > > > > > + if (refcount_dec_and_test(&qp->refcount)) > > > > > > + complete(&qp->free); > > > > > > + wait_for_completion(&qp->free); > > > > > > > > > > This flow is unclear to me. You are destroying the QP and need > > > > > to make sure that mana_ib_event_handler is not running at that > > > > > point and not mess with refcount and complete. > > > > > > > > Hi, Leon. Thanks for the concern. Here is the clarification: > > > > The flow is similar to what mlx5 does with mlx5_get_rsc and > > > mlx5_core_put_rsc. > > > > When we get a QP resource, we increment the counter while holding > > > > the xa > > > lock. > > > > So, when we destroy a QP, the code first removes the QP from the > > > > xa to > > > ensure that nobody can get it. > > > > And then we check whether mana_ib_event_handler is holding it with > > > refcount_dec_and_test. > > > > If the QP is held, then we wait for the mana_ib_event_handler to > > > > call > > > complete. > > > > Once the wait returns, it is impossible to get the QP referenced, > > > > since it is > > > not in the xa and all references have been removed. > > > > So now we can release the QP in HW, so the QPN can be assigned to > > > > new > > > QPs. > > > > > > > > Leon, have you spotted a bug? Or just wanted to understand the flow? > > > > > > I understand the "general" flow, but think that implementation is > > > very convoluted here. In addition, mlx5 and other drivers make sure > > > sure that HW object is not free before they free it. They don't > > > "mess" with ibqp, and probably you should do the same. > > > > I can make the code more readable by introducing 4 helpers: add_qp_ref, > get_qp_ref, put_qp_ref, destroy_qp_ref. > > Would it make the code less convoluted for you? > > The thing is that you are trying to open-code part of restrack logic, which > already has xarray DB per-QPN, maybe you should extend it to support your > case, by adding some sort of barrier to prevent QP from being used. > Thanks for the suggestion. I can have a look later to suggest something, but I guess it is hard to generalize for mana. Another blocker, which is not in this patch, is that in mana one RC QP has up to 5 ids (one per workqueue), where only one of them is QPN. We can get CQEs that reference one of 5 ids, which may not be supported by the restrack logic. So in future patches where I add support of send/recv/poll in the kernel, I add more indexes to the table, where most of them are not QPN, but work queue ids. Anyway, I think making helpers at this point is a good idea, as I will get fewer question later when I will send polling. So, I will send v2 tomorrow with the aforementioned helpers. Thanks > > > > The devices are different. Mana can do EQE with QPN, which can be > > destroyed by OS. With that reference counting I make sure we do not > destroy QP which is used in EQE processing. We still destroy the HW resource > at same time as before the change. > > The xa table is just a lookup table, which says whether object can be > > referenced or not. The change just dictates that we first make a QP not > referenceable. > > > > Note, that if we remove the QP from the table after we destroy it in > > HW, we can have a bug due to the collision in the xa table when at the > > same time another entity creates a QP. Since the QPN is released in HW, it > will most likely given to the next create_qp (so mana, unlike mlx, does not > assign QPNs with an increment and gives recently used QPN). So, the > create_qp can fail as the QPN is still used in the xa. > > > > And what do you mean that "don't "mess" with ibqp"? Are you saying that > we cannot process QP-related interrupts? > > What do you propose to change? In any case it will be the same logic: > > 1) remove from table > > 2) make sure that current IRQ does not hold a reference I use counters > > for that as most of other IB providers. > > > > I would also appreciate a list of changes to make this patch approved or > confirmation that no changes are required. > > I'm not asking to change anything at this point, just trying to see if there is > more general way to solve this problem, which exists in all drivers now and in > the future. > Thanks! > Thanks > > > Thanks > > > > > Thanks > > > > > > > Thanks > > > > > > > > > > > > > > Thanks