RE: [EXTERNAL] Re: [PATCH rdma-next 1/1] RDMA/mana_ib: process QP error events

Konstantin Taranov <kotaranov@xxxxxxxxxxxxx> · Thu, 6 Jun 2024 11:30:58 +0000

> -----Original Message-----
> From: Leon Romanovsky <leon@xxxxxxxxxx>
> Sent: Thursday, 6 June 2024 12:51
> To: Konstantin Taranov <kotaranov@xxxxxxxxxxxxx>
> Cc: Konstantin Taranov <kotaranov@xxxxxxxxxxxxxxxxxxx>; Wei Hu
> <weh@xxxxxxxxxxxxx>; sharmaajay@xxxxxxxxxxxxx; Long Li
> <longli@xxxxxxxxxxxxx>; jgg@xxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: [EXTERNAL] Re: [PATCH rdma-next 1/1] RDMA/mana_ib: process QP
> error events
> 
> On Thu, Jun 06, 2024 at 09:30:51AM +0000, Konstantin Taranov wrote:
> > > > > > +static void
> > > > > > +mana_ib_event_handler(void *ctx, struct gdma_queue *q, struct
> > > > > > +gdma_event *event) {
> > > > > > +	struct mana_ib_dev *mdev = (struct mana_ib_dev *)ctx;
> > > > > > +	struct mana_ib_qp *qp;
> > > > > > +	struct ib_event ev;
> > > > > > +	unsigned long flag;
> > > > > > +	u32 qpn;
> > > > > > +
> > > > > > +	switch (event->type) {
> > > > > > +	case GDMA_EQE_RNIC_QP_FATAL:
> > > > > > +		qpn = event->details[0];
> > > > > > +		xa_lock_irqsave(&mdev->qp_table_rq, flag);
> > > > > > +		qp = xa_load(&mdev->qp_table_rq, qpn);
> > > > > > +		if (qp)
> > > > > > +			refcount_inc(&qp->refcount);
> > > > > > +		xa_unlock_irqrestore(&mdev->qp_table_rq, flag);
> > > > > > +		if (!qp)
> > > > > > +			break;
> > > > > > +		if (qp->ibqp.event_handler) {
> > > > > > +			ev.device = qp->ibqp.device;
> > > > > > +			ev.element.qp = &qp->ibqp;
> > > > > > +			ev.event = IB_EVENT_QP_FATAL;
> > > > > > +			qp->ibqp.event_handler(&ev, qp-
> >ibqp.qp_context);
> > > > > > +		}
> > > > > > +		if (refcount_dec_and_test(&qp->refcount))
> > > > > > +			complete(&qp->free);
> > > > > > +		break;
> > > > > > +	default:
> > > > > > +		break;
> > > > > > +	}
> > > > > > +}
> > > > >
> > > > > <...>
> > > > >
> > > > > > @@ -620,6 +626,11 @@ static int mana_ib_destroy_rc_qp(struct
> > > > > mana_ib_qp *qp, struct ib_udata *udata)
> > > > > >  		container_of(qp->ibqp.device, struct mana_ib_dev,
> ib_dev);
> > > > > >  	int i;
> > > > > >
> > > > > > +	xa_erase_irq(&mdev->qp_table_rq, qp->ibqp.qp_num);
> > > > > > +	if (refcount_dec_and_test(&qp->refcount))
> > > > > > +		complete(&qp->free);
> > > > > > +	wait_for_completion(&qp->free);
> > > > >
> > > > > This flow is unclear to me. You are destroying the QP and need
> > > > > to make sure that mana_ib_event_handler is not running at that
> > > > > point and not mess with refcount and complete.
> > > >
> > > > Hi, Leon. Thanks for the concern. Here is the clarification:
> > > > The flow is similar to what mlx5 does with mlx5_get_rsc and
> > > mlx5_core_put_rsc.
> > > > When we get a QP resource, we increment the counter while holding
> > > > the xa
> > > lock.
> > > > So, when we destroy a QP, the code first removes the QP from the
> > > > xa to
> > > ensure that nobody can get it.
> > > > And then we check whether mana_ib_event_handler is holding it with
> > > refcount_dec_and_test.
> > > > If the QP is held, then we wait for the mana_ib_event_handler to
> > > > call
> > > complete.
> > > > Once the wait returns, it is impossible to get the QP referenced,
> > > > since it is
> > > not in the xa and all references have been removed.
> > > > So now we can release the QP in HW, so the QPN can be assigned to
> > > > new
> > > QPs.
> > > >
> > > > Leon, have you spotted a bug? Or just wanted to understand the flow?
> > >
> > > I understand the "general" flow, but think that implementation is
> > > very convoluted here. In addition, mlx5 and other drivers make sure
> > > sure that HW object is not free before they free it. They don't
> > > "mess" with ibqp, and probably you should do the same.
> >
> > I can make the code more readable by introducing 4 helpers: add_qp_ref,
> get_qp_ref, put_qp_ref, destroy_qp_ref.
> > Would it make the code less convoluted for you?
> 
> The thing is that you are trying to open-code part of restrack logic, which
> already has xarray DB per-QPN, maybe you should extend it to support your
> case, by adding some sort of barrier to prevent QP from being used.
> 

Thanks for the suggestion. I can have a look later to suggest something, but I guess it is hard to generalize for mana.
Another blocker, which is not in this patch, is that in mana one RC QP has up to 5 ids (one per workqueue), where only one of them is QPN.
We can get CQEs that reference one of 5 ids, which may not be supported by the restrack logic.
So in future patches where I add support of send/recv/poll in the kernel, I add more indexes to the table,
where most of them are not QPN, but work queue ids.

Anyway, I think making helpers at this point is a good idea, as I will get fewer question later when I will send polling.
So, I will send v2 tomorrow with the aforementioned helpers.
Thanks

> >
> > The devices are different. Mana can do EQE with QPN, which can be
> > destroyed by OS. With that reference counting I make sure we do not
> destroy QP which is used in EQE processing. We still destroy the HW resource
> at same time as before the change.
> > The xa table is just a lookup table, which says whether object can be
> > referenced or not. The change just dictates that we first make a QP not
> referenceable.
> >
> > Note, that if we remove the QP from the table after we destroy it in
> > HW, we can have a bug due to the collision in the xa table when at the
> > same time another entity creates a QP. Since the QPN is released in HW, it
> will most likely given to the next create_qp (so mana, unlike mlx, does not
> assign QPNs with an increment and gives recently used QPN). So, the
> create_qp can fail as the QPN is still used in the xa.
> >
> > And what do you mean that "don't "mess" with ibqp"? Are you saying that
> we cannot process QP-related interrupts?
> > What do you propose to change? In any case it will be the same logic:
> > 1) remove from table
> > 2) make sure that current IRQ does not hold a reference I use counters
> > for that as most of other IB providers.
> >
> > I would also appreciate a list of changes to make this patch approved or
> confirmation that no changes are required.
> 
> I'm not asking to change anything at this point, just trying to see if there is
> more general way to solve this problem, which exists in all drivers now and in
> the future.
> 

Thanks!

> Thanks
> 
> > Thanks
> >
> > > Thanks
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks