On 28.09.2011, at 04:40, Alex Williamson wrote: > On Tue, 2011-09-27 at 16:28 -0500, Scott Wood wrote: >> On 09/26/2011 07:45 PM, Alex Williamson wrote: >>> On Mon, 2011-09-26 at 18:59 -0500, Scott Wood wrote: >>>> On 09/26/2011 01:34 PM, Alex Williamson wrote: >>>>> /* Reset the device */ >>>>> #define VFIO_DEVICE_RESET _IO(, ,) >>>> >>>> What generic way do we have to do this? We should probably have a way >>>> to determine whether it's possible, without actually asking to do it. >>> >>> It's not generic, it could be a VFIO_DEVICE_PCI_RESET or we could add a >>> bit to the device flags to indicate if it's available or we could add a >>> "probe" arg to the ioctl to either check for existence or do it. >> >> Even with PCI, isn't this only possible if function-level reset is >> supported? > > There are a couple other things we can do if FLR isn't present (D3hot > transition, secondary bus reset, device specific resets are possible). > >> I think we need a flag. > > Ok, PCI has a pci_probe_reset_function() and pci_reset_function(). I'd > probably mimic those in the vfio device ops. Common vfio code can probe > the reset and set the flag appropriately and we can have a common > VFIO_DEVICE_RESET ioctl that calls into the device ops reset function. > >> For devices that can't be reset by the kernel, we'll want the ability to >> stop/start DMA acccess through the IOMMU (or other bus-specific means), >> separate from whether the fd is open. If a device is assigned to a >> partition and that partition gets reset, we'll want to disable DMA >> before we re-use the memory, and enable it after the partition has reset >> or quiesced the device (which requires the fd to be open). > > Maybe this can be accomplished via an iommu_detach_device() to > temporarily disassociate it from the domain. We could also unmap all > the DMA. Anyway, a few possibilities. > >>>>> /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */ >>>>> #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS _IOW(, , int) >>>>> #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int) >>>>> >>>>> Hope that covers it. >>>> >>>> It could be done this way, but I predict that the code (both kernel and >>>> user side) will be larger. Maybe not much more complex, but more >>>> boilerplate. >>>> >>>> How will you manage extensions to the interface? >>> >>> I would assume we'd do something similar to the kvm capabilities checks. >> >> This information is already built into the data-structure approach. > > If we define it to be part of the flags, then it's built-in to the ioctl > approach too... > >>>> The table should not be particularly large, and you'll need to keep the >>>> information around in some form regardless. Maybe in the PCI case you >>>> could produce it dynamically (though I probably wouldn't), but it really >>>> wouldn't make sense in the device tree case. >>> >>> It would be entirely dynamic for PCI, there's no advantage to caching >>> it. Even for device tree, if you can't fetch it dynamically, you'd have >>> to duplicate it between an internal data structure and a buffer reading >>> the table. >> >> I don't think we'd need to keep the device tree path/index info around >> for anything but the table -- but really, this is a minor consideration. >> >>>> You also lose the ability to easily have a human look at the hexdump for >>>> debugging; you'll need a special "lsvfio" tool. You might want one >>>> anyway to pretty-print the info, but with ioctls it's mandatory. >>> >>> I don't think this alone justifies duplicating the data and making it >>> difficult to parse on both ends. Chances are we won't need such a tool >>> for the ioctl interface because it's easier to get it right the first >>> time ;) >> >> It's not just useful for getting the code right, but for e.g. sanity >> checking that the devices were bound properly. I think such a tool >> would be generally useful, no matter what the kernel interface ends up >> being. I don't just use lspci to debug the PCI subsystem. :-) > > This is also a minor consideration. Looking at hexdumps isn't much to > rely on for debugging and if we take the step of writing a tool, it's > not much harder to write for either interface. The table is more akin > to dumping the data, but I feel the ioctl is easier for how a driver > would probably make use of the data (linear vs random access). > >>> Note that I'm not stuck on this interface, I was just thinking about how >>> to generate the table last week, it seemed like a pain so I thought I'd >>> spend a few minutes outlining an ioctl interface... turns out it's not >>> so bad. Thanks, >> >> Yeah, it can work either way, as long as the information's there and >> there's a way to add new bits of information, or new bus types, down the >> road. Mainly a matter of aesthetics between the two. > > It'd be nice if David would chime back in since he objected to the > table. Does an ioctl interface look better? Alex Graf, any opinions? I'm honestly pretty indifferent on ioctl vs. linear read. I got the impression that people dislike ioctls for whatever reason, so we went ahead and did the design based on read(). With KVM, ioctls are a constant pain to extend, but so are the constant sized fields here. Whatever you do, please introduce a "flags" field to every struct you use and add some padding at the end, so it can possibly be extended. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html