RE: [PATCH v4 1/1] RDMA/mana_ib: Add EQ interrupt support to mana ib driver.

Long Li <longli@xxxxxxxxxxxxx> · Tue, 1 Aug 2023 19:06:57 +0000

> Subject: Re: [PATCH v4 1/1] RDMA/mana_ib: Add EQ interrupt support to mana ib
> driver.
> 
> On Fri, Jul 28, 2023 at 06:22:53PM +0000, Long Li wrote:
> > > Subject: Re: [PATCH v4 1/1] RDMA/mana_ib: Add EQ interrupt support
> > > to mana ib driver.
> > >
> > > On Fri, Jul 28, 2023 at 05:51:46PM +0000, Long Li wrote:
> > > > > Subject: Re: [PATCH v4 1/1] RDMA/mana_ib: Add EQ interrupt
> > > > > support to mana ib driver.
> > > > >
> > > > > On Fri, Jul 28, 2023 at 05:07:49PM +0000, Wei Hu wrote:
> > > > > > Add EQ interrupt support for mana ib driver. Allocate EQs per
> > > > > > ucontext to receive interrupt. Attach EQ when CQ is created.
> > > > > > Call CQ interrupt handler when completion interrupt happens.
> > > > > > EQs are destroyed when ucontext is deallocated.
> > > > >
> > > > > It seems strange that interrupts would be somehow linked to a ucontext?
> > > > > interrupts are highly limited, you can DOS the entire system if
> > > > > someone abuses this.
> > > > >
> > > > > Generally I expect a properly functioning driver to use one
> > > > > interrupt per CPU
> > > core.
> > > >
> > > > Yes, MANA uses one interrupt per CPU. One interrupt is shared
> > > > among multiple EQs.
> > >
> > > So you have another multiplexing layer between the interrupt and the
> > > EQ? That is alot of multiplexing layers..
> > >
> > > > > You should tie the CQ to a shared EQ belong to the core that the
> > > > > CQ wants to have affinity to.
> > > >
> > > > The reason for using a separate EQ for a ucontext, is for
> > > > preventing DOS. If we use a shared EQ, a single ucontext can storm
> > > > this shared EQ affecting other users.
> > >
> > > With a proper design it should not be possible. The CQ adds an entry
> > > to the EQ and that should be rate limited by the ability of
> > > userspace to schedule to re-arm the CQ.
> >
> > I think DPDK user space can sometimes storm the EQ by arming the CQ
> > from user-mode.
> 
> Maybe maliciously you can do a blind re-arm, but nothing sane should do that.

Yes, we don't expect a sane user would do that. But in a containerized cloud VM, we can't trust any user. The hardware/driver is designed to isolate the damage from those bad behaviors to their own environment.

> 
> > With a malicious DPDK user, this code can be abused to arm the CQ at
> > extremely high rate.
> 
> Again, the rate of CQ re-arm is limited by the ability of userspace to schedule, I'm
> reluctant to consider that a DOS vector. Doesn't your HW have EQ overflow
> recovery?

The HW supports detecting and recovery of EQ overflow, but it is on the slow path. A bad user can still affect other users if they use the same EQ and get into recovery mode from time to time.

> 
> Frankly, stacking more layers of IRQ multiplexing doesn't seem like it should solve
> any problems, you are just shifting where the DOS can occure. Allowing userspace
> to create EQs is its own DOS direction, either you exhaust and DOS the number of
> EQs or you DOS the multiplexing layer between the interrupt and the EQ.

The hardware is designed to support a very large number EQs. In practice, this hardware limit is unlikely to be reached before other resources are running out.

The driver interrupt code limits the CPU processing time of each EQ by reading a small batch of EQEs in this interrupt. It guarantees all the EQs are checked on this CPU, and limits the interrupt processing time for any given EQ. In this way, a bad EQ (which is stormed by a bad user doing unreasonable re-arming on the CQ) can't storm other EQs on this CPU.

Thanks,

Long