Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 18 Oct 2023 16:28:48 -0300

On Wed, Oct 18, 2023 at 12:29:25PM -0600, Alex Williamson wrote:

> > I think this should be configured when the VF is provisioned. If the
> > user does not want legacy IO bar support then the VFIO VF function
> > should not advertise the capability, and they won't get driver
> > support.
> > 
> > I think that is a very reasonable way to approach this - it is how we
> > approached similar problems for mlx5. The provisioning interface is
> > what "profiles" the VF, regardless of if VFIO is driving it or not.
> 
> It seems like a huge assumption that every device is going to allow
> this degree of specification in provisioning VFs.  mlx5 is a vendor
> specific driver, it can make such assumptions in design philosophy.

I don't think it is a huge assumption.  Some degree of configuration
is already mandatory just to get basic functionality, and it isn't
like virtio can really be a full fixed HW implementation on the
control plane.

So the assumption is that some device SW that already must exist, and
already must be configurable just gains 1 more bit of
configuration. It does not seem like a big assumption to me at all.

Regardless, if we set an architecture/philosophy from the kernel side
vendors will align to it.

> > The same argument is going come with live migration. This same driver
> > will still bind and enable live migration if the virtio function is
> > profiled to support it. If you don't want that in your system then
> > don't profile the VF for migration support.
> 
> What in the virtio or SR-IOV spec requires a vendor to make this
> configurable?

The same part that describes how to make live migration work :)

> So nothing here is really "all in one place", it may be in the
> provisioning of the VF, outside of the scope of the host OS, it might
> be a collection of scripts or operators with device or interface
> specific tooling to configure the device.  Sometimes this configuration
> will be before the device is probed by the vfio-pci variant driver,
> sometimes in between probing and opening the device.

We don't have any in tree examples of between probing and opening -
I'd like to keep it that way..

> I don't see why it becomes out of scope if the variant driver itself
> provides some means for selecting a device profile.  We have evidence
> both from mdev vGPUs and here (imo) that we can expect to see that
> behavior, so why wouldn't we want to attempt some basic shared
> interface for variant drivers to implement for selecting such a profile
> rather than add to this hodgepodge

The GPU profiling approach is an artifact of the mdev sysfs. I do not
expect to actually do this in tree.. The function should be profiled
before it reaches VFIO, not after. This is often necessary anyhow
because a function can be bound to kernel driver in almost all cases
too.

Consistently following this approach prevents future problems where we
end up with different ways to profile/provision functions depending on
what driver is attached (vfio/in-kernel). That would be a mess.

> > > Another obvious option is sysfs, where we might imagine an optional
> > > "profiles" directory, perhaps under vfio-dev.  Attributes of
> > > "available" and "current" could allow discovery and selection of a
> > > profile similar to mdev types.  
> > 
> > IMHO it is a far too complex problem for sysfs.
> 
> Isn't it then just like devlink, not a silver bullet, but useful for
> some configuration? 

Yes, but that accepts the architecture that configuration and
provisioning should happen on the VFIO side at all, which I think is
not a good direction.

> AIUI, devlink shot down a means to list available
> profiles for a device and a means to select one of those profiles.

And other things, yes.

> There are a variety of attributes in sysfs which perform this sort of
> behavior.  Specifying a specific profile in sysfs can be difficult, and
> I'm not proposing sysfs profile support as a mandatory feature, but I'm
> also not a fan of the vendor specific sysfs approach that out of tree
> drivers have taken.

It is my belief we are going to have to build some good general
infrastructure to support SIOV. The action to spawn, provision and
activate a SIOV function should be a generic infrastructure of some
kind. We have already been through a precursor to all this with mlx5's
devlink infrastructure for SFs (which are basically SIOV functions),
so we have a pretty deep experience now.

mdev mushed all those steps into VFIO, but it belongs in different
layers. SIOV devices are not going to be exclusively consumed by VFIO.

If we have such a layer then it would be possible to also configure
VFIO "through the back door" of the provisioning layer in the kernel.

I think that is the closest we can get to some kind of generic API
here. The trouble is that it will not actually be generic because
provisioning is not generic or standardized. It doesn't eliminate the
need for having a user space driver component that actually
understands exactly what to do in order to fully provision something.

I don't know what to say about that from a libvirt perspective. Like
how does that world imagine provisioning network and storage
functions? All I know is at the openshift level it is done with
operators (aka user space drivers).

> The mdev type interface is certainly not perfect, but from it we've
> been able to develop mdevctl to allow persistent and complex
> configurations of mdev devices.  I'd like to see the ability to do
> something like that with variant drivers that offer multiple profiles
> without always depending on vendor specific interfaces.

I think profiles are too narrow an abstraction to be that broadly
useful beyond simple device types. Given the device variety we already
have I don't know if there is an alternative to a user space driver to
manage provisioning. Indeed that is how we see our actual deployments
already.

IOW I'm worried we invest a lot of effort in VFIO profiling for little
return.

Jason