Re: [RFC PATCH] vfio: VFIO PCI driver for Qemu

Alex Williamson <alex.williamson@xxxxxxxxxx> · Thu, 26 Jul 2012 08:56:23 -0600

On Thu, 2012-07-26 at 11:35 +0300, Avi Kivity wrote:
> On 07/25/2012 10:53 PM, Alex Williamson wrote:
> > On Wed, 2012-07-25 at 22:30 +0300, Avi Kivity wrote:
> >> On 07/25/2012 08:03 PM, Alex Williamson wrote:
> >> > This adds PCI based device assignment to Qemu using the Linux VFIO
> >> > userspace driver interface.  After setting up VFIO device access,
> >> > devices can be added to Qemu guests using the vfio-pci device
> >> > option:
> >> >
> >> >  -device vfio-pci,host=1:10.1,id=net0
> >> >
> >> >
> >> 
> >> Let's use the same syntax as for kvm device assignment.  Then we can
> >> fall back on kvm when vfio is not available.  We can also have an
> >> optional parameter kernel-driver to explicitly select vfio or kvm.
> > 
> > This seems confusing to me, pci-assign already has options like
> > prefer_msi, share_intx, and configfd that vfio doesn't.  I'm sure vfio
> > will eventually get options that pci-assign won't have.  How is a user
> > supposed to figure out what options are actually available from -device
> > pci-assign,? 
> 
> Read the documentation.

And libvirt is supposed to parse the qemu-docs package matching the
installed qemu binary package to figure out what's supported?

> > Isn't this the same as asking to drop all model specific
> > devices and just use -device net,model=e1000... hey, we've been there
> > before ;)  Thanks,
> 
> It's not.  e1000 is a guest visible feature. vfio and kvm assignment do
> exactly the same thing, as far as the guest is concerned, just using a
> different driver.  This is more akin to -device virtio-net,vhost=on|off
> (where we also have a default and a fallback, which wouldn't make sense
> for model=e1000).

I understand an agree with your desire to make this transparent from the
user perspective, but I think the place to do that abstraction is
libvirt.  The qemu command line is just the final step in a process that
already needs to be aware of which backend will be used.  This is not
simply a small tweak to the qemu options and now I'm using vfio.  It
goes something like this:

   KVM                                     VFIO
1. Identify the assigned device         1. Identify the assigned device
2. Unbind from host driver              2. Identify the iommu group for the device
3. Bind to pci-stub                     3. Evaluate all the devices for the group
4. Launch qemu                          4. Unbind all devices in the group from host drivers
                                        5. Bind all devices in the group to vfio-pci
                                        6. Launch qemu

I've actually already had a report from an early adopter that did
everything under the VFIO list on the right, but but happened to be
using qemu-kvm and the -device pci-assign option and couldn't figure out
what was going on.  Due to KVM's poor device ownership model, it was
more than happy to bind to a device owned by vfio-pci.  Imagine the
support questions we have to ask if we support both via pci-assign;
well, what version of qemu are you using and does that default to vfio
or kvm assignment or has the distro modified it to switch the default...
VFIO offers certain advantages, for instance correctly managing the
IOMMU domain on systems like Andreas' where KVM can't manage the domain
of the bridge because it doesn't understand grouping.  There are also
obvious advantages in the device ownership model.  Users want to be sure
they're using these things.

Both KVM and VFIO do strive to make the device in the guest look as much
like it does on bare metal as possible, but we don't guarantee they're
identical and we don't guarantee to match each other.  So in fact, we
can expect subtle difference in how the guest sees it.  Things like the
capabilities exposed, the emulation/virtualization of some of those
capabilities, eventually things like express config space support and
AER error propagation.  These are all a bit more than "add vhost=on to
your virtio-net-pci options and magically your networking is faster".
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html