Re: [PATCH v3 10/10] vfio/qat: Add vfio_pci driver for Intel QAT VF devices

Yishai Hadas <yishaih@xxxxxxxxxx> · Thu, 29 Feb 2024 14:36:14 +0200

On 28/02/2024 21:07, Alex Williamson wrote:
On Mon, 26 Feb 2024 15:24:58 -0700
Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

On Mon, 26 Feb 2024 15:49:52 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

On Mon, Feb 26, 2024 at 12:41:07PM -0700, Alex Williamson wrote:
On Mon, 26 Feb 2024 15:12:20 -0400
libvirt recently implemented support for managed="yes" with variant
drivers where it will find the best "vfio_pci" driver for a device
using an algorithm like Max suggested, but in practice it's not clear
how useful that will be considering devices likes CX7 require
configuration before binding to the variant driver.  libvirt has no
hooks to specify or perform configuration at that point.

I don't think this is fully accurate (or at least not what was
intended), the VFIO device can be configured any time up until the VM
mlx5 driver reaches the device startup.

Is something preventing this? Did we accidentally cache the migratable
flag in vfio or something??

I don't think so, I think this was just the policy we had decided
relative to profiling VFs when they're created rather than providing a
means to do it though a common vfio variant driver interface[1].

Turns out that yes, migration support needs to be established at probe
time.  vfio_pci_core_register_device() expects migration_flags,
mig_ops, and log_ops to all be established by this point, which for
mlx5-vfio-pci occurs when the .init function calls
mlx5vf_cmd_set_migratable().

So the VF does indeed need to be "profiled" to enabled migration prior
to binding to the mlx5-vfio-pci driver in order to report support.

Right, the 'profiling' of the VF in mlx5 case, need to be done prior to 
its probing/binding.

This is achieved today by running 'devlink <xxx> migratable enable' post 
of creating the VF.

That also makes me wonder what happens if migration support is disabled
via devlink after binding the VF to mlx5-vfio-pci.  Arguably this could
be considered user error,

Yes, this is a clear user error.

 but what's the failure mode and support
implication?  Thanks,

The user will simply get an error from the firmware, the kernel and 
other stuff around will stay safe.

Further details:
In the source side, once the VM will be started the 'disable' itself 
will fail as that configuration can't be changed once the VF is 
running/active already.

In the target, as it's in a pending mode, the 'disable' will succeed. 
However, the migration will just fail later on in the firmware upon 
running a migration related command, as expected.

Yishai