Re: [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support

Punit Agrawal <punit.agrawal@xxxxxxx> · Mon, 31 Jul 2017 18:49:44 +0100

Hi Jean-Philippe,

I ran into an issue while testing these series on a Seattle based
system. More below...

Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> writes:

> Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let
> the kernel inject MSIs from a physical device directly into the guest.
>
> It would be tempting to create the MSI routes at init time before starting
> vCPUs, when we can afford to exit gracefully. But some of it must be
> initialized when the guest requests it.
>
> * On the KVM side, MSIs must be enabled after devices allocate their IRQ
>   lines and irqchips are operational, which can happen until late_init.
>
> * On the VFIO side, hardware state of devices may be updated when setting
>   up MSIs. For example, when passing a virtio-pci-legacy device to the
>   guest:
>
>   (1) The device-specific configuration layout (in BAR0) depends on
>       whether MSIs are enabled or not in the device. If they are enabled,
>       the device-specific configuration starts at offset 24, otherwise it
>       starts at offset 20.
>   (2) Linux guest assumes that MSIs are initially disabled (doesn't
>       actually check the capability). So it reads the device config at
>       offset 20.
>   (3) Had we enabled MSIs early, host would have enabled the MSI-X
>       capability and device would return the config at offset 24.
>   (4) The guest would read junk and explode.
>
> Therefore we have to create MSI-X routes when the guest requests MSIs, and
> enable/disable them in VFIO when the guest pokes the MSI-X capability. We
> have to follow both physical and virtual state of the capability, which
> makes the state machine a bit complex, but I think it works.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx>
> ---
>  include/kvm/vfio.h |  54 +++++
>  vfio/pci.c         | 619 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 664 insertions(+), 9 deletions(-)
>

[...]

> +static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
> +{
> +	switch (cap_hdr->type) {
> +	case PCI_CAP_ID_MSIX:
> +		return PCI_CAP_MSIX_SIZEOF;
> +	default:
> +		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
> +		return 0;
> +	}
> +}
> +
> +/*
> + * Copy capability from physical header into virtual header, and add it to the
> + * virtual capability list.
> + *
> + * @fd_offset: offset of pci header into vfio device fd
> + * @pos: offset of capability from start of header
> + */
> +static int vfio_pci_add_cap(struct vfio_device *vdev, struct pci_cap_hdr *cap_hdr,
> +			    off_t fd_offset, off_t pos)
> +{
> +	int i;
> +	ssize_t size = vfio_pci_cap_size(cap_hdr);
> +	struct pci_device_header *hdr = &vdev->pci.hdr;
> +	struct pci_cap_hdr *out = (void *)hdr + pos;
> +
> +	if (pread(vdev->fd, out, size, fd_offset + pos) != size)
> +		return -errno;
> +
> +	out->next = 0;
> +
> +	if (!hdr->capabilities) {
> +		hdr->capabilities = pos;
> +		hdr->status |= PCI_STATUS_CAP_LIST;
> +	} else {
> +		/* Add cap at end of list */
> +		struct pci_cap_hdr *last;
> +
> +		pci_for_each_cap(i, last, hdr)
> +			;
> +		last->next = pos;

Setting the next capability is somehow overwriting the vendor id in the
virtual header. This leads to the device not being probed by the
appropriate driver.

I did a quick hack to prevent the vendor_id override and was able to
proceed with probing the device. But there are still some issues with
setting up the capabilities.

Can you take a look to see what's going wrong?

Thanks,
Punit

> +	}
> +
> +	return 0;
> +}
> +
>  static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  {
> +	u8 pos;
> +	int ret;
> +	struct pci_cap_hdr cap;
> +	ssize_t sz = sizeof(cap);
> +	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev = &vdev->pci;
>  
>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>  		return 0;
>  
> +	pos = pdev->hdr.capabilities & ~3;
> +	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> +
>  	pdev->hdr.status &= ~PCI_STATUS_CAP_LIST;
>  	pdev->hdr.capabilities = 0;
>  
> -	/* TODO: install virtual capabilities */
> +	for (; pos; pos = cap.next) {
> +		if (pos >= PCI_DEV_CFG_SIZE) {
> +			dev_warn(vdev, "ignoring cap outside of config space");
> +			return -EINVAL;
> +		}
> +
> +		if (pread(vdev->fd, &cap, sz, info->offset + pos) != sz) {
> +			dev_warn(vdev, "failed to read from capabilities pointer (0x%x)",
> +				 pos);
> +			return -EINVAL;
> +		}
> +
> +		switch (cap.type) {
> +		case PCI_CAP_ID_MSIX:
> +			ret = vfio_pci_add_cap(vdev, &cap, info->offset, pos);
> +			if (ret) {
> +				dev_warn(vdev, "failed to read capability structure %x",
> +					 cap.type);
> +				return ret;
> +			}
> +
> +			pdev->msi.pos = pos;
> +			pdev->irq_type = VFIO_PCI_IRQ_MSIX;
> +			break;
> +
> +			/* Any other capability is hidden */
> +		}
> +	}
>  
>  	return 0;
>  }

[...]