On Wed, Aug 18, 2021 at 06:16:03PM +0300, Yishai Hadas wrote: > From: Max Gurtovoy <mgurtovoy@xxxxxxxxxx> > > Allow device drivers to include match entries in the modules.alias file > produced by kbuild that are not used for normal driver autoprobing and > module autoloading. Drivers using these match entries can be connected > to the PCI device manually, by userspace, using the existing > driver_override sysfs. > > To achieve it, we add the 'override_only' bitmap to struct pci_device_id > and a helper macro named 'PCI_DEVICE_DRIVER_OVERRIDE' to enable setting > specific bits on it. > > The first bit (i.e. 'PCI_ID_F_VFIO_DRIVER_OVERRIDE') indicates that the > match entry is for the VFIO subsystem, it can be set by another helper > macro named 'PCI_DRIVER_OVERRIDE_DEVICE_VFIO'. > > These match entries are prefixed with "vfio_" in the modules.alias. > > For example the resulting modules.alias may have: > > alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core > alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci > alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci > > In this example mlx5_core and mlx5_vfio_pci match to the same PCI > device. The kernel will autoload and autobind to mlx5_core but the kernel > and udev mechanisms will ignore mlx5_vfio_pci. > > When userspace wants to change a device to the VFIO subsystem userspace > can implement a generic algorithm: > > 1) Identify the sysfs path to the device: > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 > > 2) Get the modalias string from the kernel: > $ cat /sys/bus/pci/devices/0000:01:00.0/modalias > pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 > > 3) Prefix it with vfio_: > vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 > > 4) Search modules.alias for the above string and select the entry that > has the fewest *'s: > alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci > > 5) modprobe the matched module name: > $ modprobe mlx5_vfio_pci > > 6) cat the matched module name to driver_override: > echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override > > 7) unbind device from original module > echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind > > 8) probe PCI drivers (or explicitly bind to mlx5_vfio_pci) > echo 0000:01:00.0 > /sys/bus/pci/drivers_probe > > The algorithm is independent of bus type. In future the other buses's with s/buses's/buses/ > VFIO device drivers, like platform and ACPI, can use this algorithm as > well. > > This patch is the infrastructure to provide the information in the > modules.alias to userspace. Convert the only VFIO pci_driver which results > in one new line in the modules.alias: > > alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci > > Later series introduce additional HW specific VFIO PCI drivers, such as > mlx5_vfio_pci. > > Signed-off-by: Max Gurtovoy <mgurtovoy@xxxxxxxxxx> > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx> > --- > Documentation/PCI/pci.rst | 1 + > drivers/pci/pci-driver.c | 27 ++++++++++++++++++++------- > drivers/vfio/pci/vfio_pci.c | 9 ++++++++- > include/linux/mod_devicetable.h | 6 ++++++ > include/linux/pci.h | 28 ++++++++++++++++++++++++++++ > scripts/mod/devicetable-offsets.c | 1 + > scripts/mod/file2alias.c | 8 ++++++-- > 7 files changed, 70 insertions(+), 10 deletions(-) > > diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst > index fa651e25d98c..87c6f4a6ca32 100644 > --- a/Documentation/PCI/pci.rst > +++ b/Documentation/PCI/pci.rst > @@ -103,6 +103,7 @@ need pass only as many optional fields as necessary: > - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) > - class and classmask fields default to 0 > - driver_data defaults to 0UL. > + - override_only field defaults to 0. > > Note that driver_data must match the value used by any of the pci_device_id > entries defined in the driver. This makes the driver_data field mandatory > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 3a72352aa5cf..8a6bd3364127 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -136,7 +136,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv, > struct pci_dev *dev) > { > struct pci_dynid *dynid; > - const struct pci_device_id *found_id = NULL; > + const struct pci_device_id *found_id = NULL, *ids; > > /* When driver_override is set, only bind to the matching driver */ > if (dev->driver_override && strcmp(dev->driver_override, drv->name)) > @@ -152,14 +152,27 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv, > } > spin_unlock(&drv->dynids.lock); > > - if (!found_id) > - found_id = pci_match_id(drv->id_table, dev); > + if (found_id) > + return found_id; > > - /* driver_override will always match, send a dummy id */ > - if (!found_id && dev->driver_override) > - found_id = &pci_device_id_any; > + for (ids = drv->id_table; (found_id = pci_match_id(ids, dev)); > + ids = found_id + 1) { > + /* > + * The match table is split based on driver_override. Check the > + * override_only as well so that any matching entry is > + * returned. > + */ > + if (!found_id->override_only || dev->driver_override) > + return found_id; The negation makes this short, but IMO, makes this harder to read. I'd rather test for the special case directly instead of testing for the *absence* of the special case, e.g., if (found_id->override_only) { if (dev->driver_override) return found_id; } else return found_id; > + } > > - return found_id; > + /* > + * if no static match, driver_override will always match, send a dummy > + * id. I think the original comment was better. This comment implies that we only checked for static matches above, but we actually checked for *both* dynamic IDs and static IDs. > + */ > + if (dev->driver_override) > + return &pci_device_id_any; > + return NULL; > } > > /** > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 07edddf7e6ca..c52620ac5e70 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -180,9 +180,16 @@ static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn) > return vfio_pci_core_sriov_configure(pdev, nr_virtfn); > } > > +static const struct pci_device_id vfio_pci_table[] = { > + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_ANY_ID, PCI_ANY_ID) }, /* match all by default */ > + {} > +}; > + > +MODULE_DEVICE_TABLE(pci, vfio_pci_table); > + > static struct pci_driver vfio_pci_driver = { > .name = "vfio-pci", > - .id_table = NULL, /* only dynamic ids */ > + .id_table = vfio_pci_table, > .probe = vfio_pci_probe, > .remove = vfio_pci_remove, > .sriov_configure = vfio_pci_sriov_configure, > diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h > index 8e291cfdaf06..39c229a7ab8c 100644 > --- a/include/linux/mod_devicetable.h > +++ b/include/linux/mod_devicetable.h > @@ -16,6 +16,10 @@ typedef unsigned long kernel_ulong_t; > > #define PCI_ANY_ID (~0) > > +enum { > + PCI_ID_F_VFIO_DRIVER_OVERRIDE = 1 << 0, > +}; > + > /** > * struct pci_device_id - PCI device ID structure > * @vendor: Vendor ID to match (or PCI_ANY_ID) > @@ -34,12 +38,14 @@ typedef unsigned long kernel_ulong_t; > * Best practice is to use driver_data as an index > * into a static list of equivalent device types, > * instead of using it as a pointer. > + * @override_only: Bitmap for override_only PCI drivers. "Match only when dev->driver_override is this driver"? As far as PCI core is concerned there's no need for this to be a bitmap. I think this would make more sense if split into two patches. The first would add override_only and change pci_match_device(). Then there's no confusion about whether this is specific to VFIO. The second can add PCI_ID_F_VFIO_DRIVER_OVERRIDE and make the file2alias.c changes. Most of the commit log applies to this part. > */ > struct pci_device_id { > __u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/ > __u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */ > __u32 class, class_mask; /* (class,subclass,prog-if) triplet */ > kernel_ulong_t driver_data; /* Data private to the driver */ > + __u32 override_only; > }; > > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 540b377ca8f6..57f9aa60f3b4 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -901,6 +901,34 @@ struct pci_driver { > .vendor = (vend), .device = (dev), \ > .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID > > +/** > + * PCI_DEVICE_DRIVER_OVERRIDE - macro used to describe a PCI device with > + * override_only flags. > + * @vend: the 16 bit PCI Vendor ID > + * @dev: the 16 bit PCI Device ID > + * @driver_override: PCI Device override_only bitmap > + * > + * This macro is used to create a struct pci_device_id that matches a > + * specific device. The subvendor and subdevice fields will be set to > + * PCI_ANY_ID. > + */ > +#define PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, driver_override) \ > + .vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \ > + .subdevice = PCI_ANY_ID, .override_only = (driver_override) > + > +/** > + * PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO > + * "driver_override" PCI device. > + * @vend: the 16 bit PCI Vendor ID > + * @dev: the 16 bit PCI Device ID > + * > + * This macro is used to create a struct pci_device_id that matches a > + * specific device. The subvendor and subdevice fields will be set to > + * PCI_ANY_ID and the flags will be set to PCI_ID_F_VFIO_DRIVER_OVERRIDE. > + */ > +#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \ > + PCI_DEVICE_DRIVER_OVERRIDE(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE) > + > /** > * PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem > * @vend: the 16 bit PCI Vendor ID > diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c > index 9bb6c7edccc4..cc3625617a0e 100644 > --- a/scripts/mod/devicetable-offsets.c > +++ b/scripts/mod/devicetable-offsets.c > @@ -42,6 +42,7 @@ int main(void) > DEVID_FIELD(pci_device_id, subdevice); > DEVID_FIELD(pci_device_id, class); > DEVID_FIELD(pci_device_id, class_mask); > + DEVID_FIELD(pci_device_id, override_only); > > DEVID(ccw_device_id); > DEVID_FIELD(ccw_device_id, match_flags); > diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c > index 7c97fa8e36bc..c3edbf73157e 100644 > --- a/scripts/mod/file2alias.c > +++ b/scripts/mod/file2alias.c > @@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename, > return 1; > } > > -/* Looks like: pci:vNdNsvNsdNbcNscNiN. */ > +/* Looks like: pci:vNdNsvNsdNbcNscNiN or <prefix>_pci:vNdNsvNsdNbcNscNiN. */ > static int do_pci_entry(const char *filename, > void *symval, char *alias) > { > @@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename, > DEF_FIELD(symval, pci_device_id, subdevice); > DEF_FIELD(symval, pci_device_id, class); > DEF_FIELD(symval, pci_device_id, class_mask); > + DEF_FIELD(symval, pci_device_id, override_only); > > - strcpy(alias, "pci:"); > + if (override_only & PCI_ID_F_VFIO_DRIVER_OVERRIDE) > + strcpy(alias, "vfio_pci:"); > + else > + strcpy(alias, "pci:"); > ADD(alias, "v", vendor != PCI_ANY_ID, vendor); > ADD(alias, "d", device != PCI_ANY_ID, device); > ADD(alias, "sv", subvendor != PCI_ANY_ID, subvendor); > -- > 2.18.1 >