On Wed, Aug 11, 2021 at 02:07:37PM -0500, Bjorn Helgaas wrote: > On Thu, Aug 05, 2021 at 09:23:57PM -0300, Jason Gunthorpe wrote: > > On Wed, Aug 04, 2021 at 03:34:12PM -0500, Bjorn Helgaas wrote: > > > > > > The first use will be to define a VFIO flag that indicates the PCI driver > > > > is a VFIO driver. > > > > > > Is there such a thing as a "VFIO driver" today? > > > > Yes. > > > > VFIO has long existed as a driver subsystem that binds drivers to > > devices in various bus types. In the case of PCI the admin moves a PCI > > device from normal operation to VFIO operation via something like: > > What specifically makes a driver a "VFIO driver"? It is a device driver whose probe function instantiates a "struct vfio_device" which binds it to the VFIO subsystem and triggers creation of the char devs, ioctls, etc. No different from every other subsystem, really. Eg a netdev driver creates a struct ndev_device, a TPM driver creates struct tpm_chip, etc. > supports the VFIO ioctls in include/uapi/linux/vfio.h? That by itself > doesn't require special treatment by the kernel, so I think there's > more here. The unique thing about VFIO, compared to all other subsystems, is that VFIO is a second choice for driver binding. A device will have a natural kernel driver, eg mlx5 naturally creates netdevs, and it has a VFIO driver option. Userspace selects if it wants the device to operate in normal mode or VFIO mode. The kernel should never move a device to VFIO mode automatically - which is the special behavior compared to any other normal pci_driver. > > echo vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override > > > > Other bus types (platform, acpi, etc) have a similar command to move > > them to VFIO. > > Do the other bus types have a flag analogous to > PCI_ID_F_VFIO_DRIVER_OVERRIDE? If we're doing something similar to > other bus types, it'd be nice if the approach were similar. They could, this series doesn't attempt it. I expect the approach to be similar as driver_override was copied from PCI to other busses. When this is completed I hope to take a look at it. > > PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id > > > > Allow device drivers to include match entries in the modules.alias file > > produced by kbuild that are not used for normal driver autoprobing and > > module autoloading. Drivers using these match entries can be connected to > > the PCI device manually, by userspace, using the existing driver_override > > sysfs. > > IIUC, the end result of this is basically a tweak to the existing > sysfs driver_override functionality. Yes.. > And I *think* (correct me if I'm wrong), this actually has nothing in > particular to do with VFIO. It's just that you want to expose some > device IDs that are only used for binding when driver_override is set. The general concept has nothing to do with VFIO but adding the "vfio_" prefix to the modalias is obviously VFIO specific. The entire point is to convay to userspace the information that the modules.alias line is just for vfio. We could imagine in future some other use for this, in which case the future user would use their own prefix, not vfio. > > When userspace wants to change a device to the VFIO subsystem userspace > > can implement a generic algorithm: > > > > 1) Identify the sysfs path to the device: > > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 > > > > 2) Get the modalias string from the kernel: > > $ cat /sys/bus/pci/devices/0000:01:00.0/modalias > > pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 > > So far, I think this is all the existing behavior, unaffected by this > patch. Yes. > > 3) Prefix it with vfio_: > > vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 > > > > 4) Search modules.alias for the above string and select the entry that > > has the fewest *'s: > > alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci > > And this patch basically adds this modules.alias entry. Yes. > Previously vfio_pci contained no vendor/device IDs, and the only way > to bind it to a device was to either: > > - Modprobe the driver and write dynamic device IDs to the driver's > /sys/.../new_id. This should directly bind the driver to all > devices that match the new IDs (see new_id_store()). > > or > > - Write "vfio_pci" to the device's /sys/.../driver_override. > AFAICS, this won't bind anything (see driver_override_store()), > but if we call the driver's .probe() method via modprobe or > rescan, the driver_override will match any device regardless of > ID. Yes > IIUC, after this patch, you can add vendor/device IDs to a struct > pci_driver with this new flag. These IDs are advertised via > modules.alias. Yes > For driver binding, IDs with the new flag are eligible to match only > when driver_override is set to the matching driver. Yes > Setting a device's driver_override has *always* caused the matching > driver to bind. The only difference after this patch is that now we > give the driver an ID from its .id_table instead of pci_device_id_any. Almost - before a .id_table entried might be returned as well. The difference here is that there are "hidden" entries in the id_table that is only used by driver_overrride and we can return that hidden entry. > > 5) modprobe the matched module name: > > $ modprobe mlx5_vfio_pci > > I assume somewhere in here you need to unbind mlx5_core before binding > mlx5_vfio_pci? Er, yes, I skipped some steps here where unbind/bind has to be done > > 6) cat the matched module name to driver_override: > > echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override > > Don't you need something here to trigger the driver attach, i.e., > should step 5 and step 6 be swapped? What if the driver is already > loaded? The full sequence is more like: echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind echo 0000:01:00.0 > /sys/bus/pci/drivers_probe > Can you modprobe again to make it bind to a second device? modprobe is a single-shot, it just loads the module and doesn't trigger any driver binding. modprobing a second time is a NOP. > I see drivers/vfio/platform/vfio_platform.c; is that what you mean? Yes, look around vfio_platform_acpi_probe() > I don't see any VFIO things with ACPI in their name, so maybe I'm > looking the wrong place. If this is purely *plans* for the future, > maybe say something like "planned VFIO drivers ..." Sure Jason