Re: Splitting a multi-function PCI device between guests with VFIO?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2014-02-10 at 13:09 -0800, Roland Dreier wrote:
> Hi everyone,
> 
> I'm updating my dev environment to use the shiny new vfio
> infrastructure for PCI assignment to kvm guests, and I'm not able to
> do what I used to do with the old-school KVM passthrough.  In
> particular, I have, say, a two-port QLogic adapter that looks like:
> 
>     82:00.0 0200: 1077:8030 (rev 02)
>     82:00.1 0200: 1077:8030 (rev 02)
>     82:00.2 0c04: 1077:8031 (rev 02)
>     82:00.3 0c04: 1077:8031 (rev 02)
>     82:00.4 0280: 1077:8032 (rev 02)
>     82:00.5 0280: 1077:8032 (rev 02)
> 
> that is, each port gets three different PCI functions (one for NIC,
> one for FCoE and one for iSCSI).
> 
> I used to be able to assign 82:00.2 to one VM and 82:00.3 to a
> different VM by binding those devices to pci_stub and using "-device
> pci-assign,host=82:00.2" and "-device pci-assign,host=82:00.3" on my
> respective QEMU command lines.  (That let me have an initiator and
> target in separate VMs with one adapter in one dev system)
> 
> However, all of those PCI devices have the same iommu_group, so now if
> I bind the devices to vfio-pci and do "s/pci-assign/vfio-pci/", the
> second QEMU to start fails with something like
> 
>     qemu-system-x86_64: -device vfio-pci,host=82:00.3: vfio: error
> opening /dev/vfio/41: Device or resource busy
>     qemu-system-x86_64: -device vfio-pci,host=82:00.3: vfio: failed to
> get group 41
>     qemu-system-x86_64: -device vfio-pci,host=82:00.3: Device
> initialization failed.
>     qemu-system-x86_64: -device vfio-pci,host=82:00.3: Device
> 'vfio-pci' could not be initialized
> 
> Is there a way to split multi-function devices (with the same
> iommu_group) between VMs with vfio?

No, there's not.  vfio uses the iommu group as the unit of ownership for
devices.  The bounds of the group are determined by the visibility and
isolation of the devices.  Visibility is typically a matter of whether
there are any conventional PCI buses between the endpoint and root
complex that could mask the requestor ID.  The problem though is likely
the isolation.  Given that devices are independently visible to the
iommu, we next look at whether they are actually isolated from each
other.  On PCI we do this with the Access Control Services (ACS)
capability.  Unless ACS tells us that all the DMA from the functions is
forwarded upstream and there is no peer-to-peer between them, we must
assume that they are able to do non-iommu translated DMA between the
functions.

The risk of doing such DMA is that one guest could exploit another, or
worse exploit the host.  With pci-assign this isolation validation was
left to userspace where libvirt failed to properly check for
multifunction ACS and even if it did, made it trivially easy to relax
ACS support with a config option.  vfio does this enforcement in the
kernel in order to protect the kernel.

I did at one point offer a patch to allow users to override ACS and
there was discussion around whether we could allow this but taint the
kernel when it's used, but it was ultimately rejected because the idea
of debugging a problem where a guest accidentally DMA'd into a
host-owned device instead of a guest owned memory page causing problems
down the road in both host and guest made heads explode.

We do however have support for quirking specific devices that the vendor
has validated to provide the ACS equivalent isolation.  If we can get a
statement from Qlogic to that effect for these devices, we can add such
a quirk, otherwise pick better hardware, add the out-of-tree ACS
override patch, or hold on to pci-assign until it gets removed from the
tree.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux