Re: [PATCH 0/5] VFIO core framework

Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> · Thu, 12 Jan 2012 15:56:47 -0500

On Tue, Jan 10, 2012 at 11:35:54AM -0700, Alex Williamson wrote:
> On Tue, 2012-01-10 at 11:26 -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Dec 21, 2011 at 02:42:02PM -0700, Alex Williamson wrote:
> > > This series includes the core framework for the VFIO driver.
> > > VFIO is a userspace driver interface meant to replace both the
> > > KVM device assignment code as well as interfaces like UIO.  Please
> > > see patch 1/5 for a complete description of VFIO, what it can do,
> > > and how it's designed.
> > > 
> > > This version and the VFIO PCI bus driver, for exposing PCI devices
> > > through VFIO, can be found here:
> > > 
> > > git://github.com/awilliam/linux-vfio.git vfio-next-20111221
> > > 
> > > A development version of qemu which includes a full working
> > > vfio-pci driver, indepdendent of KVM support, can be found here:
> > > 
> > > git://github.com/awilliam/qemu-vfio.git vfio-ng
> > > 
> > > Thanks,
> > 
> > Alex,
> > 
> > So I took a look at the patchset with two different things in mind this time:
> >  - What if you do not need to do any IRQ ack/de-ack etc in the host all of that
> >    is done in the guest (say you have an actual IOAPIC in the guest that is
> >    _not_ managed by QEMU).
> >  - What would be required to make this work with a different hypervisor - say Xen.
> > 
> > And the conclusions I came to that it would require some surgery - especially
> > as some of the IRQ, irqfs, etc code support is not required per say.
> > 
> > To me it seems to get this working with Xen (or perhaps with the Power machines
> > as well, as their hypervisor is similar to Xen in architecture?) we would need at
> > least two extra pieces of Linux kernel code: 
> > - Xen IOMMU, which really is just doing a whole bunch of xc_domain_memory_mapping
> >   the user-space iova calls. For the normal PCI devices operations it would just
> >   offload them to the existing DMA API.
> > - Xen VFIO PCI. Or at least make the VFIO PCI (in your vfio-next-20111221 branch)
> >   driver allow some abstraction. There are certain things we might done via alternate
> >   operations. Such as the interrupt handling - where we "bind" the IRQ to an event
> >   channel or make a hypercall to program the guest' MSI vectors. Perhaps there can
> >   be an "platform-specific" part of it.
> 
> Sure, I've envisioned that we'll have multiple iommu interfaces.  We'll
> need build-time and run-time selection.  I haven't implemented that yet
> since the iommu requirements are still developing.  Likewise, a
> vfio-xen-pci module is possible or we can look at whether we make the
> vfio-pci code too ugly by incorporating a dual-mode into that.

Yuck. Well, I am all up for making it pretty.

> 
> > In the userland:
> >  - In QEMU VFIO, make the interrupt part optional for certain parts (like we don't
> >    expect an IRQ to happen in the host).
> 
> Or can it be handled by vfio-xen-pci, which enables event channels
> through to xen?  It's possible the GET_IRQ_INFO ioctls could report a

Sure.
> flag indicating the type of notification available (eventfds being the
> initial option) and SET_IRQ_EVENTFDS could be generalized to take an
> array of structs other than eventfds.  For the non-Xen case, eventfds
> seem to provide us with the most flexibility since we can either connect
> them to userspace or just have userspace be the agent that connects the
> eventfd to an irqfd in another module.  See the (outdated) version of
> qemu-kvm vfio in this tree for an example (look for QEMU_KVM_BUILD):
> https://github.com/awilliam/qemu-kvm-vfio/blob/vfio/hw/vfio.c

Ah I see.
> 
> > I am curious to see how the Power folks have to deal with this? Perhaps the requirement
> > to write an PV IOMMU is not something they need to write?
> > 
> > In terms of this patchset, the "big" thing for me is that it moves the usual mechanism
> > of "unbind"/"bind" of using the SysFS to be done via ioctls. I get the reasoning for it
> > - cannot guarantee any locking, but doing it all in ioctls instead of configfs or sysfs
> > seems odd. But perhaps that is just me having gotten use to doing it in sysfs/configfs.
> > Certainly it makes it easier to program in QEMU/libvirt. And ultimately that is going
> > to be user for 99% of this.
> 
> Can you be more specific about which ioctl part you're referring to?  We
> bind/unbind each device to vfio-pci via the normal sysfs driver

Let me look again at the QEMU changes. I was thinking you did a bunch
of ioctls to assign a device, but I am probably getting it confused
with the vfio-group ioctls.

> interfaces.  Userspace binds itself to a group via ioctls, but that's
> because neither configfs or sysfs allow ioctl and I don't think it's
> possible to implement an ioctl-free vfio.  Trying to implement vfio
> across both configfs and chardev presents issues with ownership.

Right, one of them works. No need to do it across different subsystem.
> 
> > The requirement of the VFIO PCI driver to deal with all of the nasty work-arounds for
> > devices is nice. I do like the seperation - where this driver (VFIO core) deal
> > with _just_ the user facing portion. And the backends (just one right now - VFIO PCI)
> > gets to play with all the real hardware details.
> 
> Yep, and the iommu layer is intended to be the same, but is maybe not
> quite as evolved yet.
> 
> > So curious if your perception of this is similar to mine or if I had missed
> > something?
> 
> It seems like we have options for dealing with it via separate or
> modified iommu/device vfio modules and some tweaks to some of the
> ioctls.  Maybe I'm oversimplifying the xen requirements?  Thanks for the

That is the broad changes. Thought I am sure that once coding starts
we will find some new things. Hopefully they will all fit within these APIs.

> review and comments,
> 
> Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html