On Thu, 24 Jun 2021 00:00:37 +0000 "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > Sent: Wednesday, June 23, 2021 11:20 PM > > > [...] > > > So the only downside today of allocating more MSI-X vectors than > > > necessary is memory consumption for the irq descriptors. > > > > As above, this is a QEMU policy of essentially trying to be a good > > citizen and allocate only what we can infer the guest is using. What's > > a good way for QEMU, or any userspace, to know it's running on a host > > where vector exhaustion is not an issue? > > In my proposal a new command (VFIO_DEVICE_ALLOC_IRQS) is > introduced to separate allocation from enabling. The availability > of this command could be the indicator whether vector > exhaustion is not an issue now? We have options with existing interfaces if we want to provide some programmatic means through vfio to hint to userspace about vector usage. Otherwise I don't see much justification for this new ioctl, it can largely be done with SET_IRQS, or certainly with extensions of flags. > > > So no, we are not going to proliferate this complete ignorance of how > > > MSI-X actually works and just cram another "feature" into code which is > > > known to be incorrect. > > > > Some of the issues of virtualizing MSI-X are unsolvable without > > creating a new paravirtual interface, but obviously we want to work > > with existing drivers and unmodified guests, so that's not an option. > > > > To work with what we've got, the vfio API describes the limitation of > > the host interfaces via the VFIO_IRQ_INFO_NORESIZE flag. QEMU then > > makes a choice in an attempt to better reflect what we can infer of the > > guest programming of the device to incrementally enable vectors. We > > It's a surprise to me that Qemu even doesn't look at this flag today after > searching its code... There are no examples of the alternative, it would be dead, untested code. The flag exists in the uAPI to indicate a limitation of the underlying implementation that has always existed. Should we remove that limitation, as Thomas now sees as possible, then QEMU wouldn't need to make a choice whether to fully allocate the vector table or incrementally tear-down and re-init. > > could a) work to provide host kernel interfaces that allow us to remove > > that noresize flag and b) decide whether QEMU's usage policy can be > > improved on kernels where vector exhaustion is no longer an issue. > > Thomas can help confirm but looks noresize limitation is still there. > b) makes more sense since Thomas thinks vector exhaustion is not > an issue now (except one minor open about irte). As noted elsewhere, a) is indeed a limitation of the host interfaces, not implicit to MSI-X. Obviously we can look at different QEMU policies, including generating hardware faults to the VM on exhaustion or unmask failures, interrupt injection or better inferring potential vector usage. Thanks, Alex