On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote: > [ Sorry for the long hiatus, I've been wrapped up in other issues.] > > I think the fundamental issue to resolve is to decide on the model which the > VFIO driver presents to its users. > > Fundamentally, VFIO as part of the OS must protect the system from its users > and also protect the users from each other. No disagreement here. > > But another fundamental purpose of an OS to to present an abstract model of > the underlying hardware to its users, so that the users don't have to deal > with the full complexity of the hardware. > > So I think VFIO should present a 'virtual', abstracted PCI device to its users > whereas Michael has argued for a simpler model of presenting the real PCI > device config registers while preventing writes only to the registers which > would clearly disrupt the system. In fact, there is no contradiction. I am all for an abstracted API *and* I think the virtualization concept is a bad way to build this API. The 'virtual' interface you present is very complex and hardware specific: you do not hide literally *anything*. Deciding which functionality userspace needs, and exposing it to userspace as a set of APIs would be abstract. Instead you ask people to go read the PCI spec, the device spec, and bang on PCI registers, little-endian-ness and all, then try to interpret what do the virtual values mean. Example: How do I find # of MSI-X vectors? Sure, scan the capability list, find the capability, read the value, convert from little endian at each step. A page or two of code, and let's hope I have a ppc to test on. And note no driver has this code - they all use OS routines. So why wouldn't ioctl(dev, VFIO_GET_MSIX_VECTORS, &n); better serve the declared goal of presenting an abstracted PCI device to users? > Now, the virtual model *could* look little like the real hardware, and use > bunches of ioctls for everything it needs, Or reads/writes at special offsets, or sysfs attributes. > or it could look a lot like PCI and > use reads and writes of the virtual PCI config registers to trigger its > actions. The latter makes things more amenable to those porting drivers from > other environments. I really doubt this helps at all. Drivers typically use OS-specific APIs. It is very uncommon for them to touch standard registers, which is 100% of what your patch seem to be dealing with. And again, how about a small userspace library that would wrap vfio and add the abstractions for drivers that do need them? > I realize that to date the VFIO driver has been a bit of a mish-mash between > the ioctl and config based techniques; I intend to clean that up. And, yes, > the abstract model presented by VFIO will need plenty of documentation. And, it will need to be maintained forever, bugs and all. For example, if you change some register you emulated to fix a bug, to the driver this looks like a hardware change, and it will crash. The PCI spec has some weak versioning support, but it is mostly not a problem in that space: a specific driver needs to only deal with a specific device. We have a generic driver so PCI configuration space is a bad interface to use. > Since KVM/qemu already has its own notion of a virtual PCI device which it > presents to the guest OS, we either need to reconcile VFIO and qemu, or > provide a bypass of the VFIO virtual model. This could be direct access > through sysfs, or else an ioctl to VFIO. Since I have no internals knowledge > of qemu, I look to others to choose. Ah, so there will be 2 APIs, one for qemu, one for userspace drivers? > Other little things: > 1. Yes, I can share some code with sysfs if I can get the right EXPORTs there. > 2. I'll add multiple MSI support, but I wish to point out that even though the > PCI MSI API supports it, none of the architectures do. > 3. FLR needs work. I was foolish enough to assume that FLR wouldn't reset > BARs; now I know better. And as I said separately, drivers might reset BARs without FLR as well. As long as io/memory is disabled, we really should allow userspace write anything in BARs. And once we let it do it, most of the problem goes away. > 4. I'll get rid of the vfio config_map in sysfs; it was there for debugging. > 5. I'm still looking to support hotplug/unplug and power management stuff via > generic netlink notifications. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html