On Tue, Jan 10, 2012 at 11:35:54AM -0700, Alex Williamson wrote: > On Tue, 2012-01-10 at 11:26 -0500, Konrad Rzeszutek Wilk wrote: > > On Wed, Dec 21, 2011 at 02:42:02PM -0700, Alex Williamson wrote: > > > This series includes the core framework for the VFIO driver. > > > VFIO is a userspace driver interface meant to replace both the > > > KVM device assignment code as well as interfaces like UIO. Please > > > see patch 1/5 for a complete description of VFIO, what it can do, > > > and how it's designed. > > > > > > This version and the VFIO PCI bus driver, for exposing PCI devices > > > through VFIO, can be found here: > > > > > > git://github.com/awilliam/linux-vfio.git vfio-next-20111221 > > > > > > A development version of qemu which includes a full working > > > vfio-pci driver, indepdendent of KVM support, can be found here: > > > > > > git://github.com/awilliam/qemu-vfio.git vfio-ng > > > > > > Thanks, > > > > Alex, > > > > So I took a look at the patchset with two different things in mind this time: > > - What if you do not need to do any IRQ ack/de-ack etc in the host all of that > > is done in the guest (say you have an actual IOAPIC in the guest that is > > _not_ managed by QEMU). > > - What would be required to make this work with a different hypervisor - say Xen. > > > > And the conclusions I came to that it would require some surgery - especially > > as some of the IRQ, irqfs, etc code support is not required per say. > > > > To me it seems to get this working with Xen (or perhaps with the Power machines > > as well, as their hypervisor is similar to Xen in architecture?) we would need at > > least two extra pieces of Linux kernel code: > > - Xen IOMMU, which really is just doing a whole bunch of xc_domain_memory_mapping > > the user-space iova calls. For the normal PCI devices operations it would just > > offload them to the existing DMA API. > > - Xen VFIO PCI. Or at least make the VFIO PCI (in your vfio-next-20111221 branch) > > driver allow some abstraction. There are certain things we might done via alternate > > operations. Such as the interrupt handling - where we "bind" the IRQ to an event > > channel or make a hypercall to program the guest' MSI vectors. Perhaps there can > > be an "platform-specific" part of it. > > Sure, I've envisioned that we'll have multiple iommu interfaces. We'll > need build-time and run-time selection. I haven't implemented that yet > since the iommu requirements are still developing. Likewise, a > vfio-xen-pci module is possible or we can look at whether we make the > vfio-pci code too ugly by incorporating a dual-mode into that. Yuck. Well, I am all up for making it pretty. > > > In the userland: > > - In QEMU VFIO, make the interrupt part optional for certain parts (like we don't > > expect an IRQ to happen in the host). > > Or can it be handled by vfio-xen-pci, which enables event channels > through to xen? It's possible the GET_IRQ_INFO ioctls could report a Sure. > flag indicating the type of notification available (eventfds being the > initial option) and SET_IRQ_EVENTFDS could be generalized to take an > array of structs other than eventfds. For the non-Xen case, eventfds > seem to provide us with the most flexibility since we can either connect > them to userspace or just have userspace be the agent that connects the > eventfd to an irqfd in another module. See the (outdated) version of > qemu-kvm vfio in this tree for an example (look for QEMU_KVM_BUILD): > https://github.com/awilliam/qemu-kvm-vfio/blob/vfio/hw/vfio.c Ah I see. > > > I am curious to see how the Power folks have to deal with this? Perhaps the requirement > > to write an PV IOMMU is not something they need to write? > > > > In terms of this patchset, the "big" thing for me is that it moves the usual mechanism > > of "unbind"/"bind" of using the SysFS to be done via ioctls. I get the reasoning for it > > - cannot guarantee any locking, but doing it all in ioctls instead of configfs or sysfs > > seems odd. But perhaps that is just me having gotten use to doing it in sysfs/configfs. > > Certainly it makes it easier to program in QEMU/libvirt. And ultimately that is going > > to be user for 99% of this. > > Can you be more specific about which ioctl part you're referring to? We > bind/unbind each device to vfio-pci via the normal sysfs driver Let me look again at the QEMU changes. I was thinking you did a bunch of ioctls to assign a device, but I am probably getting it confused with the vfio-group ioctls. > interfaces. Userspace binds itself to a group via ioctls, but that's > because neither configfs or sysfs allow ioctl and I don't think it's > possible to implement an ioctl-free vfio. Trying to implement vfio > across both configfs and chardev presents issues with ownership. Right, one of them works. No need to do it across different subsystem. > > > The requirement of the VFIO PCI driver to deal with all of the nasty work-arounds for > > devices is nice. I do like the seperation - where this driver (VFIO core) deal > > with _just_ the user facing portion. And the backends (just one right now - VFIO PCI) > > gets to play with all the real hardware details. > > Yep, and the iommu layer is intended to be the same, but is maybe not > quite as evolved yet. > > > So curious if your perception of this is similar to mine or if I had missed > > something? > > It seems like we have options for dealing with it via separate or > modified iommu/device vfio modules and some tweaks to some of the > ioctls. Maybe I'm oversimplifying the xen requirements? Thanks for the That is the broad changes. Thought I am sure that once coding starts we will find some new things. Hopefully they will all fit within these APIs. > review and comments, > > Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html