On Mon, 26 Feb 2024 15:49:52 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Mon, Feb 26, 2024 at 12:41:07PM -0700, Alex Williamson wrote: > > On Mon, 26 Feb 2024 15:12:20 -0400 > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > On Mon, Feb 26, 2024 at 11:55:56AM -0700, Alex Williamson wrote: > > > > This will be the first intel vfio-pci variant driver, I don't think we > > > > need an intel sub-directory just yet. > > > > > > > > Tangentially, I think an issue we're running into with > > > > PCI_DRIVER_OVERRIDE_DEVICE_VFIO is that we require driver_override to > > > > bind the device and therefore the id_table becomes little more than a > > > > suggestion. Our QE is already asking, for example, if they should use > > > > mlx5-vfio-pci for all mlx5 compatible devices. > > > > > > They don't have to, but it works fine, there is no reason not to. > > > > But there's also no reason to. None of the metadata exposed by the > > driver suggests it should be a general purpose vfio-pci stand-in. > > I think the intent was to bind it to all the devices in its ID table > automatically for VFIO use and it should always be OK to do that. Not > doing so is a micro optimization. Automatic in what sense? We use PCI_DRIVER_OVERRIDE_DEVICE_VFIO to set the id_table entry to override_only, so any binding requires the user to specify a driver_override. Nothing automatically performs that driver_override except the recent support in libvirt for "managed" devices. Effectively override_only negates the id_table to little more than a suggestion to userspace of how the driver should be used. > Userspace binding it to other random things is a Bad Thing. Yes, and GregKH has opinions about who gets to keep the pieces of the broken system if a user uses dynamic ids or overrides to bind an unsupported driver to a device. > > > You are worried about someone wrongly loading a mlx5 driver on, say, > > > an Intel device? > > > > That's sort of where we're headed if we consider it valid to bind a CX5 > > to mlx5-vfio-pci just because they have a host driver with a similar > > name in common. > > I hope nobody is doing that, everyone should be using a tool that > checks that ID table.. If we lack a usable tool for that then it that > is the problemm. These are the sort of questions I'm currently fielding and yes, we don't really have any tools to make this automatic outside of the recent libvirt support. > > It's essentially a free for all. I worry about test matrices, user > > confusion, and being on guard for arbitrary devices at every turn in > > variant drivers if our policy is that they should all work > > equivalent to a basic vfio-pci-core implementation for anything. > > Today most of the drivers will just NOP themeslves if they can't find > a compatible PF driver, the most likely bug in this path would be a > NULL ptr deref or something in an untested path, or just total failure > to bind. > > We could insist that VF drivers are always able to find their PF or > binding fails, that would narrow things considerably. I think that conflicts with the guidance we've been giving for things like virtio-vfio-pci and nvgrace-gpu-vfio-pci where a device matching the id_table should at least get vfio-pci-core level functionality because there is no "try the next best driver" protocol defined if probing fails. OTOH, I think it's fair to fail probing of a device that does not match the id_table. Therefore to me, a CX7 VF with matching VID:DID should always bind to mlx5-vfio-pci regardless of the PF support for migration because that's what the id_table matches and the device has functionality with a vfio-pci-core driver. However I don't see that mlx5-vfio-pci has any obligation to bind to a CX5 device. > > libvirt recently implemented support for managed="yes" with variant > > drivers where it will find the best "vfio_pci" driver for a device > > using an algorithm like Max suggested, but in practice it's not clear > > how useful that will be considering devices likes CX7 require > > configuration before binding to the variant driver. libvirt has no > > hooks to specify or perform configuration at that point. > > I don't think this is fully accurate (or at least not what was > intended), the VFIO device can be configured any time up until the VM > mlx5 driver reaches the device startup. > > Is something preventing this? Did we accidentally cache the migratable > flag in vfio or something?? I don't think so, I think this was just the policy we had decided relative to profiling VFs when they're created rather than providing a means to do it though a common vfio variant driver interface[1]. Nothing necessarily prevents libvirt from configuring devices after binding to a vfio-pci variant driver, but per the noted thread the upstream policy is to have the device configured prior to giving it to vfio and any configuration mechanisms are specific to the device and variant driver, therefore what more is libvirt to do? > > The driverctl script also exists and could maybe consider the > > "vfio-pci" driver name to be a fungible alias for the best matching > > vfio_pci class driver, but I'm not sure if driverctl has a sufficient > > following to make a difference. > > I was also thinking about this tool as an alternative instruction to > using libvirt. > > Maybe this would ecourage more people to use it if it implemented it? driverctl is a very powerful and very unchecked tool. It loads the gun and points it at the user's foot and taunts them to pull the trigger. I suspect use cases beyond vfio are largely theoretical, but it allows specifying an arbitrary driver for almost any device. OTOH, libvirt is more targeted and while it will work automatically for a "managed" device specified via domain xml, it's also available as a utility through virsh, ex: # virsh nodedev-detach pci_0000_01_10_0 Which should trigger the code to bind to vfio-pci or the best variant driver. So while there's a place for driverctl and I wouldn't mind driver matching being added, I have trouble seeing enterprise distros really getting behind it to recommend as best practice without a lot more safety checks. Relative to variant driver probe() re-checking the id_table, I know we generally don't try to set policy in the kernel and this sort of behavior has been around for a long, long time with new_id, but the barrier to entry has been substantially lowered with things like driverctl that I'd still entertain the idea. Thanks, Alex [1]https://lore.kernel.org/all/20231018163333.GZ3952@xxxxxxxxxx/