Re: Expose vfio device display/migration to libvirt and above, was Re: [PATCH 0/3] sample: vfio mdev display devices.

Alex Williamson <alex.williamson@xxxxxxxxxx> · Fri, 4 May 2018 11:06:47 -0600

On Fri, 4 May 2018 10:16:09 +0100
Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote:

> On Thu, May 03, 2018 at 12:58:00PM -0600, Alex Williamson wrote:
> > Hi,
> > 
> > The previous discussion hasn't produced results, so let's start over.
> > Here's the situation:
> > 
> >  - We currently have kernel and QEMU support for the QEMU vfio-pci
> >    display option.
> > 
> >  - The default for this option is 'auto', so the device will attempt to
> >    generate a display if the underlying device supports it, currently
> >    only GVTg and some future release of NVIDIA vGPU (plus Gerd's
> >    sample mdpy and mbochs).
> > 
> >  - The display option is implemented via two different mechanism, a
> >    vfio region (NVIDIA, mdpy) or a dma-buf (GVTg, mbochs).
> > 
> >  - Displays using dma-buf require OpenGL support, displays making
> >    use of region support do not.
> > 
> >  - Enabling OpenGL support requires specific VM configurations, which
> >    libvirt /may/ want to facilitate.
> > 
> >  - Probing display support for a given device is complicated by the
> >    fact that GVTg and NVIDIA both impose requirements on the process
> >    opening the device file descriptor through the vfio API:
> > 
> >    - GVTg requires a KVM association or will fail to allow the device
> >      to be opened.
> > 
> >    - NVIDIA requires that their vgpu-manager process can locate a UUID
> >      for the VM via the process commandline.
> > 
> >    - These are both horrible impositions and prevent libvirt from
> >      simply probing the device itself.  
> 
> Agreed, these requirements are just horrific. Probing for features
> should not require this kind of level environmental setup. I can
> just about understand & accept how we ended up here, because this
> scenario is not one that was strongly considered when the first impls
> were being done. I don't think we should accept it as a long term
> requirement though.
> 
> > Erik Skultety, who initially raised the display question, has identified
> > one possible solution, which is to simply make the display configuration
> > the user's problem (apologies if I've misinterpreted Erik).  I believe
> > this would work something like:
> > 
> >  - libvirt identifies a version of QEMU that includes 'display' support
> >    for vfio-pci devices and defaults to adding display=off for every
> >    vfio-pci device [have we chosen the wrong default (auto) in QEMU?].
> > 
> >  - New XML support would allow a user to enable display support on the
> >    vfio device.
> > 
> >  - Resolving any OpenGL dependencies of that change would be left to
> >    the user.
> > 
> > A nice aspect of this is that policy decisions are left to the user and
> > clearly no interface changes are necessary, perhaps with the exception
> > of deciding whether we've made the wrong default choice for vfio-pci
> > devices in QEMU.  
> 
> Unless I'm mis-understanding this isn't really a solution to the
> problem, rather it is us simply giving up and telling someone else
> to try to fix the problem. The 'user' here is not a human - it is
> simply the next level up in the mgmt stack, eg OpenStack or oVirt.
> If we can't solve it acceptably in libvirt code, I don't have much
> hope that OpenStack can solve it in their code, since they have
> even stronger need to automate everything.

But to solve this at any level other than the user suggests there is
one "right" answer to automatically configuring the device.  Is there?
If a device supports a display, does the user necessarily want to
enable it?  If there's a difference between enabling a display for a
local user or a remote user, is there any reasonable expectation that
we can automatically make that determination?

> > On the other hand, if we do want to give libvirt a mechanism to probe
> > the display support for a device, we can make a simplified QEMU
> > instance be the mechanism through which we do that.  For example the
> > script[1] can be provided with either a PCI device or sysfs path to an
> > mdev device and run a minimal VM instance meeting the requirements of
> > both GVTg and NVIDIA to report the display support and GL requirements
> > for a device.  There are clearly some unrefined and atrocious bits of
> > this script, but it's only a proof of concept, the process management
> > can be improved and we can decide whether we want to provide qmp
> > mechanism to introspect the device rather than grep'ing error
> > messages.  The goal is simply to show that we could choose to embrace
> > QEMU and use it not as a VM, but simply a tool for poking at a device
> > given the restrictions the mdev vendor drivers have already imposed.  
> 
> Feels like a pretty heavy weight solution, that just encourages the
> drivers to continue down the undesirable path they're already on,
> possibly making the situation even worse over time.

I'm not getting the impression that the vendor drivers are considering
a change, or necessarily can change.  The NVIDIA UUID requirement
certainly seems arbitrary, but page tracking via KVM seems to be more
directly useful to maintaining the address space of the device relative
to the VM, even if it really wasn't the intent of the mdev interface.
Perhaps we could introduce vfio interfaces to replace this, but is that
just adding an unnecessary layer of interaction for all but this probe
activity.  Maybe the KVM interface should never have been added, but
given that it exists, does it make sense to say that it can't be used,
or required?  Thanks,

Alex