Re: RFC: Creating mediated devices with libvirt

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Fri, 16 Jun 2017 16:40:07 +0100

On Fri, Jun 16, 2017 at 11:32:04AM -0400, Laine Stump wrote:
> On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > On Thu, 15 Jun 2017 09:33:01 +0100
> > "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote:
> > 
> >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> >>> Hi all,
> >>>
> >>> so there's been an off-list discussion about finally implementing creation of
> >>> mediated devices with libvirt and it's more than desired to get as many opinions
> >>> on that as possible, so please do share your ideas. This did come up already as
> >>> part of some older threads ([1] for example), so this will be a respin of the
> >>> discussions. Long story short, we decided to put device creation off and focus
> >>> on the introduction of the framework as such first and build upon that later,
> >>> i.e. now.
> >>>
> >>> [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> >>>
> >>> ========================================
> >>> PART 1: NODEDEV-DRIVER
> >>> ========================================
> >>>
> >>> API-wise, device creation through the nodedev driver should be pretty
> >>> straightforward and without any issues, since virNodeDevCreateXML takes an XML
> >>> and does support flags. Looking at the current device XML:
> >>>
> >>> <device>
> >>>   <name>mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f</name>
> >>>   <path>/sys/devices/pci0000:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f</path>
> >>>   <parent>pci_0000_03_00_0</parent>
> >>>   <driver>
> >>>     <name>vfio_mdev</name>
> >>>   </driver>
> >>>   <capability type='mdev'>
> >>>     <type id='nvidia-11'/>
> >>>     <iommuGroup number='13'/>
> >>>     <uuid>UUID<uuid> <!-- optional enhancement, see below -->
> >>>   </capability>
> >>> </device>
> >>>
> >>> We can ignore <path>,<driver>,<iommugroup> elements, since these are useless
> >>> during creation. We also cannot use <name> since we don't support arbitrary
> >>> names and we also can't rely on users providing a name in correct form which we
> >>> would need to further parse in order to get the UUID.
> >>> So since the only thing missing to successfully use create an mdev using XML is
> >>> the UUID (if user doesn't want it to be generated automatically), how about
> >>> having a <uuid> subelement under <capability> just like PCIs have <domain> and
> >>> friends, USBs have <bus> & <device>, interfaces have <address> to uniquely
> >>> identify the device even if the name itself is unique.
> >>> Removal of a device should work as well, although we might want to
> >>> consider creating a *Flags version of the API.
> >>>
> >>> =============================================================
> >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> >>> =============================================================
> >>>
> >>> There were some doubts about auto-creation mentioned in [1], although they
> >>> weren't specified further. So hopefully, we'll get further in the discussion
> >>> this time.
> >>>
> >>> From my perspective there are two main reasons/benefits to that:
> >>>
> >>> 1) Convenience
> >>> For apps like virt-manager, user will want to add a host device transparently,
> >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for
> >>> higher management apps, like oVirt, even they might not care about the parent
> >>> device at all times and considering that they would need to enumerate the
> >>> parents, pick one, create the device XML and pass it to the nodedev driver, IMHO
> >>> it would actually be easier and faster to just do it directly through sysfs,
> >>> bypassing libvirt once again....  
> >>
> >> The convenience only works if the policy we've provided in libvirt actually
> >> matches the policy the application wants. I think it is quite likely that with
> >> cloud the mdevs will be created out of band from the domain startup process.
> >> It is possible the app will just have a fixed set of mdevs pre-created when
> >> the host starts up. Or that the mgmt app wants the domain startup process to
> >> be a two phase setup, where it first allocates the resources needed, and later
> >> then tries to start the guest. This is why I keep saying that putting this kind
> >> of "convenient" policy in libvirt is a bad idea - it is essentially just putting
> >> a bit of virt-manager code into libvirt - more advanced apps will need more
> >> flexibility in this area.
> >>
> >>> 2) Future domain migration
> >>> Suppose now that the mdev backing physical devices support state dump and
> >>> reload. Chances are, that the corresponding mdev doesn't even exist or has a
> >>> different UUID on the destination, so libvirt would do its best to handle this
> >>> before the domain could be resumed.  
> >>
> >> This is not an unusual scenario - there are already many other parts of the
> >> device backend config that need to change prior to migration, especially for
> >> anything related to host devices, so apps already have support for doing
> >> this, which is more flexible & convenient becasue it doesn't tie creation of
> >> the mdevs to running of the migrate command.
> >>
> >> IOW, I'm still against adding any kind of automatic creation policy for
> >> mdevs in libvirt. Just provide the node device API support.
> > 
> > I'm not super clear on the extent of what you're against here, is it
> > all forms of device creation or only a placement policy?  Are you
> > against any form of having the XML specify the non-instantiated mdev
> > that it wants?  We've clearly made an important step with libvirt
> > supporting pre-created mdevs, but as a user of that support I find it
> > incredibly tedious.  I typically do a dumpxml, copy out the UUID,
> > wonder what type of device it might have been last time, create it,
> > start the domain and cross my fingers. Pre-creating mdev devices is not
> > really practical, I might have use cases where I want multiple low-end
> > mdev devices and another where I have a single high-end device.  Those
> > cannot exist at the same time. Requiring extensive higher level
> > management tools is not really an option either, I'm not going to
> > install oVirt on my desktop/laptop just so I can launch a GVT-g VM once
> > in a while (no offense).  So I really hope that libvirt itself can
> > provide some degree of mdev creation.
> 
> 
> Maybe there can be something in between the "all child devices must be
> pre-created" and "a child device will be automatically created on an
> automatically chosen parent device as needed". In particular, we could
> forego the "automatically chosen parent device" part of that. The guest
> configuration could simply contain the PCI address of the parent and the
> desired type of the child. If we did this there wouldn't be any policy
> decision to make - all the variables are determined - but it would make
> life easier for people running small hosts (i.e. no oVirt/Openstack, a
> single mdev parent device). Openstack and oVirt (and whoever) would of
> course be free to ignore this and pre-create pools of devices themselves
> in the name of more precise control and better predictability (just as,
> for example, OpenStack ignores libvirt's "pools of hostdev network
> devices" and instead manages the pool of devices itself and uses
> <interface type='hostdev'> directly).

FWIW, I consider the pools of hostdev network feature as a prime example
of something we shouldn't repeat. We encoded a specific policy into
libvirt and as a result the feature is largely useless for any non-trivial
use case. In retrospect we shouldn't have added that network pools magic
IMHO.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list