Re: [PATCH 0/3] sample: vfio mdev display devices.

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 18 Apr 2018 12:31:53 -0600

On Mon,  9 Apr 2018 12:35:10 +0200
Gerd Hoffmann <kraxel@xxxxxxxxxx> wrote:

> This little series adds three drivers, for demo-ing and testing vfio
> display interface code.  There is one mdev device for each interface
> type (mdpy.ko for region and mbochs.ko for dmabuf).

Erik Skultety brought up a good question today regarding how libvirt is
meant to handle these different flavors of display interfaces and
knowing whether a given mdev device has display support at all.  It
seems that we cannot simply use the default display=auto because
libvirt needs to specifically configure gl support for a dmabuf type
interface versus not having such a requirement for a region interface,
perhaps even removing the emulated graphics in some cases (though I
don't think we have boot graphics through either solution yet).
Additionally, GVT-g seems to need the x-igd-opregion support
enabled(?), which is a non-starter for libvirt as it's an experimental
option!

Currently the only way to determine display support is through the
VFIO_DEVICE_QUERY_GFX_PLANE ioctl, but for libvirt to probe that on
their own they'd need to get to the point where they could open the
vfio device and perform the ioctl.  That means opening a vfio
container, adding the group, setting the iommu type, and getting the
device.  I was initially a bit appalled at asking libvirt to do that,
but the alternative is to put this information in sysfs, but doing that
we risk that we need to describe every nuance of the mdev device
through sysfs and it becomes a dumping ground for every possible
feature an mdev device might have.

So I was ready to return and suggest that maybe libvirt should probe
the device to know about these ancillary configuration details, but
then I remembered that both mdev vGPU vendors had external dependencies
to even allow probing the device.  KVMGT will fail to open the device
if it's not associated with an instance of KVM and NVIDIA vGPU, I
believe, will fail if the vGPU manager process cannot find the QEMU
instance to extract the VM UUID.  (Both of these were bad ideas)

Therefore, how can libvirt know if a given mdev device supports a
display and which type of display it supports, and potentially which
vendor specific options might be required to further enable that
display (if they weren't experimental)?  A terrible solution would be
that libvirt hard codes that NVIDIA works with regions and Intel works
with dmabufs, but even then there's a backwards and forwards
compatibility problem, that libvirt needs to support older kernels and
drivers where display support is not present and newer drivers where
perhaps Intel is now doing regions and NVIDIA is supporting dmabuf, so
it cannot simply be assumed based on the vendor. The only solution I see
down that path would be identifying specific {vendor,type} pairs that
support a predefined display type, but that's just absurd to think that
vendors would rev their mdev types to expose this and that libvirt
would keep a database mapping types to features.  We also have the name
and description attributes, but these are currently free form, so
libvirt rightfully ignores them entirely.  I don't know if we could
create a defined feature string within those free form strings.

Otherwise, it seems we have no choice but to dive into the pool of
exposing such features via sysfs and we'll need to be vigilant of
feature creep or vendor specific features (ex. we're not adding a
feature to indicate an opregion requirement).  How should we do this?
Perhaps a bar we can set is that if a feature cannot be discovered
through a standard vfio API, then it is not suitable for this sysfs
API.  Such things can be described via our existing mdev vendor
specific attribute interface.

We currently have this sysfs interface:

mdev_supported_types/
|-- $VENDOR_TYPE
|   |-- available_instances
|   |-- create
|   |-- description
|   |-- device_api
|   |-- devices
|   `-- name

ioctls for vfio devices which only provide information include:

VFIO_DEVICE_GET_INFO
VFIO_DEVICE_GET_REGION_INFO
VFIO_DEVICE_GET_IRQ_INFO
VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
VFIO_DEVICE_QUERY_GFX_PLANE

We don't need to support all of these initially, but here's a starting
idea for what this may look like in sysfs:

$VENDOR_TYPE/
|-- available_instances
|-- create
|-- description
|-- device_api
|-- devices
|-- name
`-- vfio-pci
    `-- device
        |-- gfx_plane
        |   |-- dmabuf
        |   `-- region
        |-- irqs
        |   |-- 0
        |   |   |-- count
        |   |   `-- flags
        |   `-- 1
        |       |-- count
        |       `-- flags
        `-- regions
            |-- 0
            |   |-- flags
            |   |-- offset
            |   `-- size
            `-- 3
                |-- flags
                |-- offset
                `-- size

The existing device_api file reports "vfio-pci", so we base the device
API info in a directory named vfio-pci.  We're specifically exposing
device information, so we have a device directory.  We have a GFX_PLANE
query ioctl, so we have a gfx_plane sub-directory.  I imagine the
dmabuf and region files here expose either Y/N or 1/0.  I continue on
the example with how we might expose irqs and regions, but even with
regions we can bury down into how is sparse mmap exposed, how are
device specific regions described, etc.  Filling this in to completion
without a specific userspace need to expose the information is just an
exercise in bloating the kernel.

That almost begins to look reasonable, but then we can only expose this
for mdev devices, what if we were to hack a back door into a directly
assigned GPU that tracks the location of active display in the
framebuffer and implement the GFX_PLANE interface for that?  We have no
sysfs representation for either the template or the actual device for
anything other than mdev.  This inconsistency with physically assigned
devices has been one of my arguments against enhancing mdev sysfs.

Thanks to anyone still reading this.  Ideas how we might help libvirt
fill this information void so that they can actually configure a VM
with a display device?  Thanks,

Alex