On Wed, Aug 04, 2021 at 03:34:12PM -0500, Bjorn Helgaas wrote: > > The first use will be to define a VFIO flag that indicates the PCI driver > > is a VFIO driver. > > Is there such a thing as a "VFIO driver" today? Yes. VFIO has long existed as a driver subsystem that binds drivers to devices in various bus types. In the case of PCI the admin moves a PCI device from normal operation to VFIO operation via something like: echo vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override Other bus types (platform, acpi, etc) have a similar command to move them to VFIO. > > VFIO drivers have a few special properties compared to normal PCI drivers: > > - They do not automatically bind. VFIO drivers are used to swap out the > > normal driver for a device and convert the PCI device to the VFIO > > subsystem. > > The comment below says "... any matching PCI_ID_F_DRIVER_OVERRIDE > [sic] entry is returned," which sounds like the opposite of "do not > automatically bind." Might be exposing my VFIO ignorance here. The comment is in error > > The admin must make this choice and following the current uAPI this is > > usually done by using the driver_override sysfs. > > I'm not sure "converting PCI device to the VFIO subsystem" is the > right way to phrase this, but whatever it is, make this idea specific, > e.g., by "echo pci-stub > /sys/.../driver_override" or whatever. The next version will include the sequence we worked out with Alex in the other branch of this thread. See below > > - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing > > them with 'vfio_pci:' instead of the normal 'pci:'. > > > > This allows the userspace machinery that switches devices to VFIO to > > know what kernel drivers support what devices and allows it to trigger > > the proper device_override. > > What does "switch device to VFIO" mean? I could be reading this too > literally (in my defense, I'm not a VFIO expert), but AFAICT this is > not something you do to the *device*. It means change the struct device_driver bound to the struct device - which is an operation that the admin does on the device object. > I guess maybe this is something like "prevent the normal driver from > claiming the device so we can use VFIO instead"? no.. > Does "using VFIO" mean getting vfio-pci to claim the device? If by claim you mean bind a pci_driver to the pci_dev, then yes. > > As existing tools do not recognize the "vfio_pci:" mod-alias prefix this > > keeps todays behavior the same. VFIO remains on the side, is never > > autoloaded and can only be activated by direct admin action. > > s/todays/today's/ > > > This patch is the infrastructure to provide the information in the > > modules.alias to userspace and enable the only PCI VFIO driver. Later > > series introduce additional HW specific VFIO PCI drivers. > > s/the only/only the/ ? (Not sure what you intend, but "the only" > doesn't seem right) "the only" is correct, at this point in the sequence there is only one pci_driver that uses this, vfio_pci.ko > Sorry, I know I'm totally missing the point here. Lets try again.. PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id Allow device drivers to include match entries in the modules.alias file produced by kbuild that are not used for normal driver autoprobing and module autoloading. Drivers using these match entries can be connected to the PCI device manually, by userspace, using the existing driver_override sysfs. Add the flag PCI_ID_F_VFIO_DRIVER_OVERRIDE to indicate that the match entry is for the VFIO subsystem. These match entries are prefixed with "vfio_" in the modules.alias. For example the resulting modules.alias may have: alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci In this example mlx5_core and mlx5_vfio_pci match to the same PCI device. The kernel will autoload and autobind to mlx5_core but the kernel and udev mechanisms will ignore mlx5_vfio_pci. When userspace wants to change a device to the VFIO subsystem userspace can implement a generic algorithm: 1) Identify the sysfs path to the device: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 2) Get the modalias string from the kernel: $ cat /sys/bus/pci/devices/0000:01:00.0/modalias pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 3) Prefix it with vfio_: vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 4) Search modules.alias for the above string and select the entry that has the fewest *'s: alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci 5) modprobe the matched module name: $ modprobe mlx5_vfio_pci 6) cat the matched module name to driver_override: echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override The algorithm is independent of bus type. In future the other buses's with VFIO device drivers, like platform and ACPI, can use this algorithm as well. This patch is the infrastructure to provide the information in the modules.alias to userspace. Convert the only VFIO pci_driver which results in one new line in the modules.alias: alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci Later series introduce additional HW specific VFIO PCI drivers, such as mlx5_vfio_pci. > > diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c > > index 7c97fa8e36bc..f53b38e8f696 100644 > > +++ b/scripts/mod/file2alias.c > > @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename, > > return 1; > > } > > > > -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */ > > +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */ > > static int do_pci_entry(const char *filename, > > void *symval, char *alias) > > { > > @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename, > > DEF_FIELD(symval, pci_device_id, subdevice); > > DEF_FIELD(symval, pci_device_id, class); > > DEF_FIELD(symval, pci_device_id, class_mask); > > + DEF_FIELD(symval, pci_device_id, flags); > > I'm a little bit wary of adding a new field to this kernel/user > interface just for this single bit. Maybe it's justified but feels > like it's worth being careful. A couple of us looked at this in one of the RFC threads.. As far as we could tell this is not a kernel/user interface. It is an interface within kbuild between gcc and file2alias and is not used or really exported beyond the kernel build sequence. Debian code search didn't find anything, for instance. modules.alias, as output by file2alias during kbuild, is the canonical "kernel/user" interface here. Everything that needs this data should be using that. Thanks, Jason