On Thu, Mar 15, 2018 at 03:05:00PM +0000, Jean-Philippe Brucker wrote: > Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let > the kernel inject MSIs from a physical device directly into the guest. > > It would be tempting to create the MSI routes at init time before starting > vCPUs, when we can afford to exit gracefully. But some of it must be > initialized when the guest requests it. > > * On the KVM side, MSIs must be enabled after devices allocate their IRQ > lines and irqchips are operational, which can happen until late_init. > > * On the VFIO side, hardware state of devices may be updated when setting > up MSIs. For example, when passing a virtio-pci-legacy device to the > guest: > > (1) The device-specific configuration layout (in BAR0) depends on > whether MSIs are enabled or not in the device. If they are enabled, > the device-specific configuration starts at offset 24, otherwise it > starts at offset 20. > (2) Linux guest assumes that MSIs are initially disabled (doesn't > actually check the capability). So it reads the device config at > offset 20. > (3) Had we enabled MSIs early, host would have enabled the MSI-X > capability and device would return the config at offset 24. > (4) The guest would read junk and explode. > > Therefore we have to create MSI-X routes when the guest requests MSIs, and > enable/disable them in VFIO when the guest pokes the MSI-X capability. We > have to follow both physical and virtual state of the capability, which > makes the state machine a bit complex, but I think it works. > > An important missing feature is the absence of pending MSI handling. When > a vector or the function is masked, we should rewire the IRQFD to a > special thread that keeps note of pending interrupts (or just poll the > IRQFD before recreating the route?). And when the vector is unmasked, one > MSI should be injected if it was pending. At the moment no MSI is > injected, we simply disconnect the IRQFD and all messages are lost. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> Just as a general point, I couldn't figure out how you handle the case where a guest reconfigures the MSI BARs concurrently with an MMIO exit to the previously configured BARs. Is there sufficient locking to serialise this? Will