Hi Jean-Philippe, I ran into an issue while testing these series on a Seattle based system. More below... Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> writes: > Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let > the kernel inject MSIs from a physical device directly into the guest. > > It would be tempting to create the MSI routes at init time before starting > vCPUs, when we can afford to exit gracefully. But some of it must be > initialized when the guest requests it. > > * On the KVM side, MSIs must be enabled after devices allocate their IRQ > lines and irqchips are operational, which can happen until late_init. > > * On the VFIO side, hardware state of devices may be updated when setting > up MSIs. For example, when passing a virtio-pci-legacy device to the > guest: > > (1) The device-specific configuration layout (in BAR0) depends on > whether MSIs are enabled or not in the device. If they are enabled, > the device-specific configuration starts at offset 24, otherwise it > starts at offset 20. > (2) Linux guest assumes that MSIs are initially disabled (doesn't > actually check the capability). So it reads the device config at > offset 20. > (3) Had we enabled MSIs early, host would have enabled the MSI-X > capability and device would return the config at offset 24. > (4) The guest would read junk and explode. > > Therefore we have to create MSI-X routes when the guest requests MSIs, and > enable/disable them in VFIO when the guest pokes the MSI-X capability. We > have to follow both physical and virtual state of the capability, which > makes the state machine a bit complex, but I think it works. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> > --- > include/kvm/vfio.h | 54 +++++ > vfio/pci.c | 619 ++++++++++++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 664 insertions(+), 9 deletions(-) > [...] > +static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr) > +{ > + switch (cap_hdr->type) { > + case PCI_CAP_ID_MSIX: > + return PCI_CAP_MSIX_SIZEOF; > + default: > + pr_err("unknown PCI capability 0x%x", cap_hdr->type); > + return 0; > + } > +} > + > +/* > + * Copy capability from physical header into virtual header, and add it to the > + * virtual capability list. > + * > + * @fd_offset: offset of pci header into vfio device fd > + * @pos: offset of capability from start of header > + */ > +static int vfio_pci_add_cap(struct vfio_device *vdev, struct pci_cap_hdr *cap_hdr, > + off_t fd_offset, off_t pos) > +{ > + int i; > + ssize_t size = vfio_pci_cap_size(cap_hdr); > + struct pci_device_header *hdr = &vdev->pci.hdr; > + struct pci_cap_hdr *out = (void *)hdr + pos; > + > + if (pread(vdev->fd, out, size, fd_offset + pos) != size) > + return -errno; > + > + out->next = 0; > + > + if (!hdr->capabilities) { > + hdr->capabilities = pos; > + hdr->status |= PCI_STATUS_CAP_LIST; > + } else { > + /* Add cap at end of list */ > + struct pci_cap_hdr *last; > + > + pci_for_each_cap(i, last, hdr) > + ; > + last->next = pos; Setting the next capability is somehow overwriting the vendor id in the virtual header. This leads to the device not being probed by the appropriate driver. I did a quick hack to prevent the vendor_id override and was able to proceed with probing the device. But there are still some issues with setting up the capabilities. Can you take a look to see what's going wrong? Thanks, Punit > + } > + > + return 0; > +} > + > static int vfio_pci_parse_caps(struct vfio_device *vdev) > { > + u8 pos; > + int ret; > + struct pci_cap_hdr cap; > + ssize_t sz = sizeof(cap); > + struct vfio_region_info *info; > struct vfio_pci_device *pdev = &vdev->pci; > > if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST)) > return 0; > > + pos = pdev->hdr.capabilities & ~3; > + info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info; > + > pdev->hdr.status &= ~PCI_STATUS_CAP_LIST; > pdev->hdr.capabilities = 0; > > - /* TODO: install virtual capabilities */ > + for (; pos; pos = cap.next) { > + if (pos >= PCI_DEV_CFG_SIZE) { > + dev_warn(vdev, "ignoring cap outside of config space"); > + return -EINVAL; > + } > + > + if (pread(vdev->fd, &cap, sz, info->offset + pos) != sz) { > + dev_warn(vdev, "failed to read from capabilities pointer (0x%x)", > + pos); > + return -EINVAL; > + } > + > + switch (cap.type) { > + case PCI_CAP_ID_MSIX: > + ret = vfio_pci_add_cap(vdev, &cap, info->offset, pos); > + if (ret) { > + dev_warn(vdev, "failed to read capability structure %x", > + cap.type); > + return ret; > + } > + > + pdev->msi.pos = pos; > + pdev->irq_type = VFIO_PCI_IRQ_MSIX; > + break; > + > + /* Any other capability is hidden */ > + } > + } > > return 0; > } [...]