Hey Laine, On 23/08/2022 15:11, Laine Stump wrote: > ping. > > I have a different version of this patch where I do read the modules.alias file > rather than just checking the name of the driver, but that also requires "double > mocking" open() in the unit test, which wasn't working properly, and I'd rather > not spend the time figuring it out if it's not going to be needed. (Alex prefers > that version because it is more correct than just checking the name, and he's > concerned that the new sysfs-based API may take longer than we're thinking to > get into downstream distros, but the version in this patch does satisfy both > Jason and Daniel's suggested implementations). Anyway, I can post the other > patch if anyone is interested. > [sorry for the thread necromancy] I was wondering if you're planning on respinning this work, or rather the modalias approach alternative you mention? Or perhaps we are waiting for the cdev sysfs? Though, there's still a significant amount of kernel versions to cover that won't have the sysfs entry sadly :( > On 8/11/22 1:00 AM, Laine Stump wrote: >> Before a PCI device can be assigned to a guest with VFIO, that device >> must be bound to the vfio-pci driver rather than to the device's >> normal driver. The vfio-pci driver provides APIs that permit QEMU to >> perform all the necessary operations to make the device accessible to >> the guest. >> >> There has been kernel work recently to support vendor/device-specific >> VFIO variant drivers that provide the basic vfio-pci driver functionality >> while adding support for device-specific operations (for example these >> device-specific drivers are planned to support live migration of >> certain devices). All that will be needed to make this functionality >> available will be to bind the new vendor-specific driver to the device >> (rather than the generic vfio-pci driver, which will continue to work >> just without the extra functionality). >> >> But until now libvirt has required that all PCI devices being assigned >> to a guest with VFIO specifically have the "vfio-pci" driver bound to >> the device. So even if the user manually binds a shiny new >> vendor-specific vfio variant driver to the device (and puts >> "managed='no'" in the config to prevent libvirt from changing the >> binding), libvirt will just fail during startup of the guest (or >> during hotplug) because the driver bound to the device isn't exactly >> "vfio-pci". >> >> This patch loosens that restriction a bit - rather than requiring that >> the device be bound to "vfio-pci", it also checks if the drivername >> contains the string "vfio" at all, and in this case allows the >> operation to continue. If the driver is in fact a VFIO variant, then >> the assignment will succeed, but if it is not a VFIO variant then QEMU >> will fail (and report the error back to libvirt). >> >> In the near future (possibly by kernel 6.0) there will be a >> formal method of identifying a VFIO variant driver by looking in >> sysfs; in the meantime the inexact, but simple, method in this patch >> will allow users of the few existing VFIO variant drivers (and >> developers of new VFIO variant drivers) to use their new drivers >> without needing to remove libvirt from their setup - they can simply >> pre-bind the device to the new driver, then use "managed='no'" in >> their libvirt config. >> >> NB: this patch does *not* handle automatically determining the proper >> vendor-specific driver and binding to it in the case of >> "managed='yes'". This will be implemented later when there is a widely >> available driver / device combo we can use for testing. >> >> Signed-off-by: Laine Stump <laine@xxxxxxxxxx> >> --- >> V1 here: https://listman.redhat.com/archives/libvir-list/2022-August/233327.html >> >> Change in V2: >> >> V1 used the output of modinfo to look for "vfio_pci" as an alias for a >> driver to see if it was a VFIO variant driver. >> >> As a result of discussion of V1, V2 is much simpler - it just assumes >> that any driver with "vfio" in the name is a VFIO variant. This is >> okay because 1) QEMU will still catch it and libvirt will properly log >> the error if the driver isn't actually a VFIO variant, and 2) it's a >> temporary situation, just to enable use of VFIO variant drivers with >> libvirt until a standard method of detecting this is added to sysfs >> (which, according to the discussion of V1, is coming in the near >> future). >> >> (NB: I did implement checking of /lib/modules/`uname -r`/modules.alias >> as suggested by Erik, but it turned out that this caused the unit >> tests to call uname(3) and open the modules.alias file on the test >> host - for a proper unit test I would have also needed to mock these >> two functions, and it seemed like too much complexity for a temporary >> workaround. I've implemented Jason's suggestion here (accept any >> driver with "vfio" in the name), which is similar to danpb's >> suggestion (accept specifically the two drivers that are already in >> the upstream kernel), but will also allow for new drivers that may be >> under development.) >> >> src/hypervisor/virhostdev.c | 26 ++++--------- >> src/util/virpci.c | 76 ++++++++++++++++++++++++++++++++++--- >> src/util/virpci.h | 3 ++ >> 3 files changed, 82 insertions(+), 23 deletions(-) >> >> diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c >> index c0ce867596..15b35fa75e 100644 >> --- a/src/hypervisor/virhostdev.c >> +++ b/src/hypervisor/virhostdev.c >> @@ -747,9 +747,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, >> mgr->inactivePCIHostdevs) < 0) >> goto reattachdevs; >> } else { >> - g_autofree char *driverPath = NULL; >> - g_autofree char *driverName = NULL; >> - int stub; >> + g_autofree char *drvName = NULL; >> + virPCIStubDriver drvType; >> /* Unmanaged devices should already have been marked as >> * inactive: if that's the case, we can simply move on */ >> @@ -769,18 +768,14 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, >> * information about active / inactive device across >> * daemon restarts has been implemented */ >> - if (virPCIDeviceGetDriverPathAndName(pci, >> - &driverPath, &driverName) < 0) >> + if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0) >> goto reattachdevs; >> - stub = virPCIStubDriverTypeFromString(driverName); >> - >> - if (stub > VIR_PCI_STUB_DRIVER_NONE && >> - stub < VIR_PCI_STUB_DRIVER_LAST) { >> + if (drvType > VIR_PCI_STUB_DRIVER_NONE) { >> /* The device is bound to a known stub driver: store this >> * information and add a copy to the inactive list */ >> - virPCIDeviceSetStubDriver(pci, stub); >> + virPCIDeviceSetStubDriver(pci, drvType); >> VIR_DEBUG("Adding PCI device %s to inactive list", >> virPCIDeviceGetName(pci)); >> @@ -2292,18 +2287,13 @@ virHostdevPrepareOneNVMeDevice(virHostdevManager >> *hostdev_mgr, >> /* Let's check if all PCI devices are NVMe disks. */ >> for (i = 0; i < virPCIDeviceListCount(pciDevices); i++) { >> virPCIDevice *pci = virPCIDeviceListGet(pciDevices, i); >> - g_autofree char *drvPath = NULL; >> g_autofree char *drvName = NULL; >> - int stub = VIR_PCI_STUB_DRIVER_NONE; >> + virPCIStubDriver drvType; >> - if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) >> + if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0) >> goto cleanup; >> - if (drvName) >> - stub = virPCIStubDriverTypeFromString(drvName); >> - >> - if (stub == VIR_PCI_STUB_DRIVER_VFIO || >> - STREQ_NULLABLE(drvName, "nvme")) >> + if (drvType == VIR_PCI_STUB_DRIVER_VFIO || STREQ_NULLABLE(drvName, >> "nvme")) >> continue; >> VIR_WARN("Suspicious NVMe disk assignment. PCI device " >> diff --git a/src/util/virpci.c b/src/util/virpci.c >> index 7800966963..51ccf4d9fd 100644 >> --- a/src/util/virpci.c >> +++ b/src/util/virpci.c >> @@ -277,6 +277,71 @@ virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, char >> **path, char **name) >> } >> +/** >> + * virPCIDeviceGetDriverNameAndType: >> + * @dev: virPCIDevice object to examine >> + * @drvName: returns name of driver bound to this device (if any) >> + * @drvType: returns type of driver if it is a known stub driver type >> + * >> + * Find the name of the driver bound to @dev (if any) and the type of >> + * the driver if it is a known/recognized "stub" driver (based on the >> + * driver name). >> + * >> + * There are vfio "variant" drivers that provide all the basic >> + * functionality of the standard vfio-pci driver as well as additional >> + * stuff. There is a plan to add info to sysfs that will allow easily >> + * determining if a driver is a vfio variant driver, but that sysfs >> + * entry isn't yet available. In the meantime as a workaround so that >> + * the few existing vfio variant drivers can be used with libvirt, and >> + * so that driver developers can test their new vfio variant drivers >> + * without needing to bypass libvirt, we also check if the driver name >> + * contains the string "vfio"; if it does, then we consider this drier >> + * as type VFIO. This can lead to false positives, but that isn't a >> + * horrible thing, because the problem will still be caught by QEMU as >> + * soon as libvirt makes the request to attach the device. >> + * >> + * Return 0 on success, -1 on failure. If -1 is returned, then an error >> + * message has been logged. >> + */ >> +int >> +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, >> + char **drvName, >> + virPCIStubDriver *drvType) >> +{ >> + g_autofree char *drvPath = NULL; >> + int tmpType; >> + >> + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0) >> + return -1; >> + >> + if (!*drvName) { >> + *drvType = VIR_PCI_STUB_DRIVER_NONE; >> + return 0; >> + } >> + >> + tmpType = virPCIStubDriverTypeFromString(*drvName); >> + >> + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { >> + *drvType = tmpType; >> + return 0; /* exact match of a known driver name (or no name) */ >> + } >> + >> + /* Check if the drivername contains "vfio" and count as a VFIO >> + * driver if so - see above for explanation. >> + */ >> + >> + if (strstr(*drvName, "vfio")) { >> + VIR_DEBUG("Driver %s is a vfio_pci driver", *drvName); >> + *drvType = VIR_PCI_STUB_DRIVER_VFIO; >> + } else { >> + VIR_DEBUG("Driver %s is NOT a vfio_pci driver", *drvName); >> + *drvType = VIR_PCI_STUB_DRIVER_NONE; >> + } >> + >> + return 0; >> +} >> + >> + >> static int >> virPCIDeviceConfigOpenInternal(virPCIDevice *dev, bool readonly, bool fatal) >> { >> @@ -1004,8 +1069,8 @@ virPCIDeviceReset(virPCIDevice *dev, >> virPCIDeviceList *activeDevs, >> virPCIDeviceList *inactiveDevs) >> { >> - g_autofree char *drvPath = NULL; >> g_autofree char *drvName = NULL; >> + virPCIStubDriver drvType; >> int ret = -1; >> int fd = -1; >> int hdrType = -1; >> @@ -1032,15 +1097,16 @@ virPCIDeviceReset(virPCIDevice *dev, >> * reset it whenever appropriate, so doing it ourselves would just >> * be redundant. >> */ >> - if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, &drvName) < 0) >> + if (virPCIDeviceGetDriverNameAndType(dev, &drvName, &drvType) < 0) >> goto cleanup; >> - if (virPCIStubDriverTypeFromString(drvName) == VIR_PCI_STUB_DRIVER_VFIO) { >> - VIR_DEBUG("Device %s is bound to vfio-pci - skip reset", >> - dev->name); >> + if (drvType == VIR_PCI_STUB_DRIVER_VFIO) { >> + >> + VIR_DEBUG("Device %s is bound to %s - skip reset", dev->name, drvName); >> ret = 0; >> goto cleanup; >> } >> + >> VIR_DEBUG("Resetting device %s", dev->name); >> if ((fd = virPCIDeviceConfigOpenWrite(dev)) < 0) >> diff --git a/src/util/virpci.h b/src/util/virpci.h >> index 4d9193f24e..0532b90f90 100644 >> --- a/src/util/virpci.h >> +++ b/src/util/virpci.h >> @@ -280,6 +280,9 @@ int virPCIDeviceRebind(virPCIDevice *dev); >> int virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, >> char **path, >> char **name); >> +int virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, >> + char **drvName, >> + virPCIStubDriver *drvType); >> int virPCIDeviceIsPCIExpress(virPCIDevice *dev); >> int virPCIDeviceHasPCIExpressLink(virPCIDevice *dev); >