On Fri, 16 Jun 2017 18:11:17 +0100 "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote: > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote: > > On Fri, 16 Jun 2017 11:32:04 -0400 > > Laine Stump <laine@xxxxxxxxxx> wrote: > > > > > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > > > On Thu, 15 Jun 2017 09:33:01 +0100 > > > > "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote: > > > > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > > >>> Hi all, > > > >>> > > > >>> so there's been an off-list discussion about finally implementing creation of > > > >>> mediated devices with libvirt and it's more than desired to get as many opinions > > > >>> on that as possible, so please do share your ideas. This did come up already as > > > >>> part of some older threads ([1] for example), so this will be a respin of the > > > >>> discussions. Long story short, we decided to put device creation off and focus > > > >>> on the introduction of the framework as such first and build upon that later, > > > >>> i.e. now. > > > >>> > > > >>> [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > >>> > > > >>> ======================================== > > > >>> PART 1: NODEDEV-DRIVER > > > >>> ======================================== > > > >>> > > > >>> API-wise, device creation through the nodedev driver should be pretty > > > >>> straightforward and without any issues, since virNodeDevCreateXML takes an XML > > > >>> and does support flags. Looking at the current device XML: > > > >>> > > > >>> <device> > > > >>> <name>mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f</name> > > > >>> <path>/sys/devices/pci0000:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f</path> > > > >>> <parent>pci_0000_03_00_0</parent> > > > >>> <driver> > > > >>> <name>vfio_mdev</name> > > > >>> </driver> > > > >>> <capability type='mdev'> > > > >>> <type id='nvidia-11'/> > > > >>> <iommuGroup number='13'/> > > > >>> <uuid>UUID<uuid> <!-- optional enhancement, see below --> > > > >>> </capability> > > > >>> </device> > > > >>> > > > >>> We can ignore <path>,<driver>,<iommugroup> elements, since these are useless > > > >>> during creation. We also cannot use <name> since we don't support arbitrary > > > >>> names and we also can't rely on users providing a name in correct form which we > > > >>> would need to further parse in order to get the UUID. > > > >>> So since the only thing missing to successfully use create an mdev using XML is > > > >>> the UUID (if user doesn't want it to be generated automatically), how about > > > >>> having a <uuid> subelement under <capability> just like PCIs have <domain> and > > > >>> friends, USBs have <bus> & <device>, interfaces have <address> to uniquely > > > >>> identify the device even if the name itself is unique. > > > >>> Removal of a device should work as well, although we might want to > > > >>> consider creating a *Flags version of the API. > > > >>> > > > >>> ============================================================= > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > > >>> ============================================================= > > > >>> > > > >>> There were some doubts about auto-creation mentioned in [1], although they > > > >>> weren't specified further. So hopefully, we'll get further in the discussion > > > >>> this time. > > > >>> > > > >>> From my perspective there are two main reasons/benefits to that: > > > >>> > > > >>> 1) Convenience > > > >>> For apps like virt-manager, user will want to add a host device transparently, > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for > > > >>> higher management apps, like oVirt, even they might not care about the parent > > > >>> device at all times and considering that they would need to enumerate the > > > >>> parents, pick one, create the device XML and pass it to the nodedev driver, IMHO > > > >>> it would actually be easier and faster to just do it directly through sysfs, > > > >>> bypassing libvirt once again.... > > > >> > > > >> The convenience only works if the policy we've provided in libvirt actually > > > >> matches the policy the application wants. I think it is quite likely that with > > > >> cloud the mdevs will be created out of band from the domain startup process. > > > >> It is possible the app will just have a fixed set of mdevs pre-created when > > > >> the host starts up. Or that the mgmt app wants the domain startup process to > > > >> be a two phase setup, where it first allocates the resources needed, and later > > > >> then tries to start the guest. This is why I keep saying that putting this kind > > > >> of "convenient" policy in libvirt is a bad idea - it is essentially just putting > > > >> a bit of virt-manager code into libvirt - more advanced apps will need more > > > >> flexibility in this area. > > > >> > > > >>> 2) Future domain migration > > > >>> Suppose now that the mdev backing physical devices support state dump and > > > >>> reload. Chances are, that the corresponding mdev doesn't even exist or has a > > > >>> different UUID on the destination, so libvirt would do its best to handle this > > > >>> before the domain could be resumed. > > > >> > > > >> This is not an unusual scenario - there are already many other parts of the > > > >> device backend config that need to change prior to migration, especially for > > > >> anything related to host devices, so apps already have support for doing > > > >> this, which is more flexible & convenient becasue it doesn't tie creation of > > > >> the mdevs to running of the migrate command. > > > >> > > > >> IOW, I'm still against adding any kind of automatic creation policy for > > > >> mdevs in libvirt. Just provide the node device API support. > > > > > > > > I'm not super clear on the extent of what you're against here, is it > > > > all forms of device creation or only a placement policy? Are you > > > > against any form of having the XML specify the non-instantiated mdev > > > > that it wants? We've clearly made an important step with libvirt > > > > supporting pre-created mdevs, but as a user of that support I find it > > > > incredibly tedious. I typically do a dumpxml, copy out the UUID, > > > > wonder what type of device it might have been last time, create it, > > > > start the domain and cross my fingers. Pre-creating mdev devices is not > > > > really practical, I might have use cases where I want multiple low-end > > > > mdev devices and another where I have a single high-end device. Those > > > > cannot exist at the same time. Requiring extensive higher level > > > > management tools is not really an option either, I'm not going to > > > > install oVirt on my desktop/laptop just so I can launch a GVT-g VM once > > > > in a while (no offense). So I really hope that libvirt itself can > > > > provide some degree of mdev creation. > > > > > > > > > Maybe there can be something in between the "all child devices must be > > > pre-created" and "a child device will be automatically created on an > > > automatically chosen parent device as needed". In particular, we could > > > forego the "automatically chosen parent device" part of that. The guest > > > configuration could simply contain the PCI address of the parent and the > > > desired type of the child. If we did this there wouldn't be any policy > > > decision to make - all the variables are determined - but it would make > > > life easier for people running small hosts (i.e. no oVirt/Openstack, a > > > single mdev parent device). Openstack and oVirt (and whoever) would of > > > course be free to ignore this and pre-create pools of devices themselves > > > in the name of more precise control and better predictability (just as, > > > for example, OpenStack ignores libvirt's "pools of hostdev network > > > devices" and instead manages the pool of devices itself and uses > > > <interface type='hostdev'> directly). > > > > This seems not that substantially different from managed='yes' on a > > vfio hostdev to me. It makes the device available to the VM before it > > starts and returns it after. In one case that's switching the binding > > on an existing device, in another it's creating and removing. Once > > again, I can't tell from Dan's response if he's opposed to this entire > > idea or just the aspects where libvirt needs to impose a policy > > decision. For me personally, the functionality difference is quite > > substantial. > > I'm fine with libvirt having APIs in the node device APIs to enable > create/delete with libvirt, as well as using managed=yes in the same > manner that we do for regular PCI devices (the bind/unbind to vfio > or pci-back) > > I'm only against the creation/deletion of mdevs, as a side effect of > starting/stopping the guest. But this is exactly the useful case, and as Laine describes above can be done without any policy decisions on the part of libvirt. The XML defines a parent device and mdev type, libvirt tries to create it, just as it might a tap device into a bridge, either it works and the VM is started or it doesn't and we get an error. libvirt doesn't require tap devices to exist prior to the VM starting. Thanks, Alex -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list