Re: [PATCH 09/12] PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn,

On 8/6/2021 3:23 AM, Jason Gunthorpe wrote:
On Wed, Aug 04, 2021 at 03:34:12PM -0500, Bjorn Helgaas wrote:

The first use will be to define a VFIO flag that indicates the PCI driver
is a VFIO driver.
Is there such a thing as a "VFIO driver" today?
Yes.

VFIO has long existed as a driver subsystem that binds drivers to
devices in various bus types. In the case of PCI the admin moves a PCI
device from normal operation to VFIO operation via something like:

echo vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override

Other bus types (platform, acpi, etc) have a similar command to move
them to VFIO.

VFIO drivers have a few special properties compared to normal PCI drivers:
  - They do not automatically bind. VFIO drivers are used to swap out the
    normal driver for a device and convert the PCI device to the VFIO
    subsystem.
The comment below says "... any matching PCI_ID_F_DRIVER_OVERRIDE
[sic] entry is returned," which sounds like the opposite of "do not
automatically bind."  Might be exposing my VFIO ignorance here.
The comment is in error
    The admin must make this choice and following the current uAPI this is
    usually done by using the driver_override sysfs.
I'm not sure "converting PCI device to the VFIO subsystem" is the
right way to phrase this, but whatever it is, make this idea specific,
e.g., by "echo pci-stub > /sys/.../driver_override" or whatever.
The next version will include the sequence we worked out with Alex in
the other branch of this thread. See below

  - The modules.alias includes the IDs of the VFIO PCI drivers, prefixing
    them with 'vfio_pci:' instead of the normal 'pci:'.

    This allows the userspace machinery that switches devices to VFIO to
    know what kernel drivers support what devices and allows it to trigger
    the proper device_override.
What does "switch device to VFIO" mean?  I could be reading this too
literally (in my defense, I'm not a VFIO expert), but AFAICT this is
not something you do to the *device*.
It means change the struct device_driver bound to the struct device -
which is an operation that the admin does on the device object.

I guess maybe this is something like "prevent the normal driver from
claiming the device so we can use VFIO instead"?
no..

Does "using VFIO" mean getting vfio-pci to claim the device?
If by claim you mean bind a pci_driver to the pci_dev, then yes.

As existing tools do not recognize the "vfio_pci:" mod-alias prefix this
keeps todays behavior the same. VFIO remains on the side, is never
autoloaded and can only be activated by direct admin action.
s/todays/today's/

This patch is the infrastructure to provide the information in the
modules.alias to userspace and enable the only PCI VFIO driver. Later
series introduce additional HW specific VFIO PCI drivers.
s/the only/only the/ ?  (Not sure what you intend, but "the only"
doesn't seem right)
"the only" is correct, at this point in the sequence there is only one
pci_driver that uses this, vfio_pci.ko

Sorry, I know I'm totally missing the point here.
Lets try again..

PCI: Add a PCI_ID_F_VFIO_DRIVER_OVERRIDE flag to struct pci_device_id

Allow device drivers to include match entries in the modules.alias file
produced by kbuild that are not used for normal driver autoprobing and
module autoloading. Drivers using these match entries can be connected to
the PCI device manually, by userspace, using the existing driver_override
sysfs.

Add the flag PCI_ID_F_VFIO_DRIVER_OVERRIDE to indicate that the match
entry is for the VFIO subsystem. These match entries are prefixed with
"vfio_" in the modules.alias.

For example the resulting modules.alias may have:

   alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core
   alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
   alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci

In this example mlx5_core and mlx5_vfio_pci match to the same PCI
device. The kernel will autoload and autobind to mlx5_core but the kernel
and udev mechanisms will ignore mlx5_vfio_pci.

When userspace wants to change a device to the VFIO subsystem userspace
can implement a generic algorithm:

    1) Identify the sysfs path to the device:
     /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0

    2) Get the modalias string from the kernel:
     $ cat /sys/bus/pci/devices/0000:01:00.0/modalias
     pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00

    3) Prefix it with vfio_:
     vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00

    4) Search modules.alias for the above string and select the entry that
       has the fewest *'s:
     alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci

    5) modprobe the matched module name:
     $ modprobe mlx5_vfio_pci

    6) cat the matched module name to driver_override:
     echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override

The algorithm is independent of bus type. In future the other buses's with
VFIO device drivers, like platform and ACPI, can use this algorithm as
well.

This patch is the infrastructure to provide the information in the
modules.alias to userspace. Convert the only VFIO pci_driver which
results in one new line in the modules.alias:

   alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci

Later series introduce additional HW specific VFIO PCI drivers, such as
mlx5_vfio_pci.

are we good with this commit message ?

And with the code logic ?

We would like to send V2 with the proposed fixes and the above commit message and get your ack on this.

Our goal is to merge this series and the first preparation series "Provide core infrastructure for managing open/release" sent by Jason to kernel 5.15.

The first series is in the final review phase but this series is mostly depend on this patch. For the other patches we have some kind of agreement.

hopefully we can collect more "reviewed-by" signatures before sending V2.


diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 7c97fa8e36bc..f53b38e8f696 100644
+++ b/scripts/mod/file2alias.c
@@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename,
  	return 1;
  }
-/* Looks like: pci:vNdNsvNsdNbcNscNiN. */
+/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */
  static int do_pci_entry(const char *filename,
  			void *symval, char *alias)
  {
@@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename,
  	DEF_FIELD(symval, pci_device_id, subdevice);
  	DEF_FIELD(symval, pci_device_id, class);
  	DEF_FIELD(symval, pci_device_id, class_mask);
+	DEF_FIELD(symval, pci_device_id, flags);
I'm a little bit wary of adding a new field to this kernel/user
interface just for this single bit.  Maybe it's justified but feels
like it's worth being careful.
A couple of us looked at this in one of the RFC threads..

As far as we could tell this is not a kernel/user interface. It is an
interface within kbuild between gcc and file2alias and is not used or
really exported beyond the kernel build sequence.

Debian code search didn't find anything, for instance.

modules.alias, as output by file2alias during kbuild, is the canonical
"kernel/user" interface here. Everything that needs this data should
be using that.

Thanks,
Jason



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux