Re: Virtualizing MSI-X on IMS via VFIO

Alex Williamson <alex.williamson@xxxxxxxxxx> · Thu, 24 Jun 2021 11:52:36 -0600

On Thu, 24 Jun 2021 00:00:37 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:

> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Wednesday, June 23, 2021 11:20 PM
> >  
> [...]
>  > > So the only downside today of allocating more MSI-X vectors than
> > > necessary is memory consumption for the irq descriptors.  
> > 
> > As above, this is a QEMU policy of essentially trying to be a good
> > citizen and allocate only what we can infer the guest is using.  What's
> > a good way for QEMU, or any userspace, to know it's running on a host
> > where vector exhaustion is not an issue?  
> 
> In my proposal a new command (VFIO_DEVICE_ALLOC_IRQS) is
> introduced to separate allocation from enabling. The availability
> of this command could be the indicator whether vector 
> exhaustion is not an issue now?

We have options with existing interfaces if we want to provide some
programmatic means through vfio to hint to userspace about vector
usage.  Otherwise I don't see much justification for this new ioctl, it
can largely be done with SET_IRQS, or certainly with extensions of
flags.

> > > So no, we are not going to proliferate this complete ignorance of how
> > > MSI-X actually works and just cram another "feature" into code which is
> > > known to be incorrect.  
> > 
> > Some of the issues of virtualizing MSI-X are unsolvable without
> > creating a new paravirtual interface, but obviously we want to work
> > with existing drivers and unmodified guests, so that's not an option.
> > 
> > To work with what we've got, the vfio API describes the limitation of
> > the host interfaces via the VFIO_IRQ_INFO_NORESIZE flag.  QEMU then
> > makes a choice in an attempt to better reflect what we can infer of the
> > guest programming of the device to incrementally enable vectors.  We  
> 
> It's a surprise to me that Qemu even doesn't look at this flag today after
> searching its code...

There are no examples of the alternative, it would be dead, untested
code.  The flag exists in the uAPI to indicate a limitation of the
underlying implementation that has always existed.  Should we remove
that limitation, as Thomas now sees as possible, then QEMU wouldn't
need to make a choice whether to fully allocate the vector table or
incrementally tear-down and re-init.

> > could a) work to provide host kernel interfaces that allow us to remove
> > that noresize flag and b) decide whether QEMU's usage policy can be
> > improved on kernels where vector exhaustion is no longer an issue.  
> 
> Thomas can help confirm but looks noresize limitation is still there. 
> b) makes more sense since Thomas thinks vector exhaustion is not 
> an issue now (except one minor open about irte).

As noted elsewhere, a) is indeed a limitation of the host interfaces,
not implicit to MSI-X.  Obviously we can look at different QEMU
policies, including generating hardware faults to the VM on exhaustion
or unmask failures, interrupt injection or better inferring potential
vector usage.  Thanks,

Alex