[libvirt PATCH 01/15] util: properly deal with module vs. driver when binding device to driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Sigh, resent because I forgot to update the list address in the reply :-/)

Sorry, I meant to respond to this earlier and forgot. Now I'm sending v2 of the patches, and wanted to make sure you didn't think I was just ignoring your comments :-) (since this patch is still there in V2)

On 10/23/23 9:39 AM, Jason Gunthorpe wrote:
On Mon, Oct 23, 2023 at 12:54:37AM -0400, Laine Stump wrote:

When we recently gained the ability to manually specify a driver to
bind to with virsh nodedev-detach, the fragility of this system became
apparent - if a user gives the driver name as "vfio_pci", then we
would modprobe the module, but then erroneously believe it hadn't been
loaded because /sys/bus/pci/drivers/vfio_pci didn't exist. For manual
specification of the driver name, this could be dealt with by telling
the user "always use the correct name for the driver, don't assume
that it is the same as the modulename", but it would still end up
confusing people, especially since some drivers do use underscore in
their name (e.g. the mlx5_vfio_pci driver/module).

I think you are creating more confusion by allowing this. The driver
name from module.alias should be used consistently in its exact
format.

But the name in the modules.alias file *is not* the name of the driver. It is the name of the module, and the name of the driver may or may not be the same - that's the entire reason this issue came to light.

And in the end I don't do anything more than what modprobe itself does. (Well, actually that's not true, since modprobe never has to actually *use* the drivers that are implemented by the modules that it loads, it doesn't need to understand that the name of the driver may be different from the name of the module, and so it doesn't have to look at the link in /sys/modules/$modulename/drivers/pci:$drivername to get the proper name that must be written to sysfs to get the driver bound to the device. It just unilaterally decides that all horizontal lines in a module name are "_", and so it replaces all "-" with "_")

Really it is modprobe's practice of auto-translating all "-" to "_", combined with the inconsistency of naming between module names and driver names that is causing the confusion; what this patch is doing is helping to clear up that confusion, not make it worse. Essentially, whether someone uses the module name or the driver name, libvirt will get the correct driver bound to the device. (and we do have to allow both module name and driver name; the former in cases where the module hasn't been loaded yet and so the driver isn't available, and the latter in cases where the driver is statically linked into the kernel, as discussed below).


Log a 'did you mean XYZ" or something if it doesn't work out.

Pass the "driver name" directly to modprobe and let it sort it out

In the cases (e.g. vfio_pci/vfio-pci) where the module name and driver name mismatch only due to -/_, what you say will work as long as the *driver* name is used (i.e. "vfio-pci"), but if we use the name discovered from modules.alias ("vfio_pci") then the modprobe will succeed, but the bind will fail.

And we can't just assume that the driver name will always have "-" in place of "_" and do an auto-replace in the name prior to binding the device to the driver, because some drivers (in particular, mlx5_vfio_pci/mlx5_vfio_pci) *don't* use "-" in the driver name. libvirt has to do *something* in order to work in both of these cases, and I think anything less than what I've done would be "half baked" (well, half *something*, but I'd rather not swear on a public forum :-))


Forget about modules entirely in the libvirt level, don't even check
it.

Yes, if we're manually specifying the driver, then (at least in the case of vfio-pci and mlx5_vfio_pci) it *does* work to always use the driver name (because modprobe will change vfio-pci to vfio_pci, and leave mlx5_vfio_pci alone). But when we're discovering the variant driver via modules.alias (which is the official method of finding the best VFIO variant driver for a device), we don't start out with the driver name, we start out with (a modalias string, that leads us to) the *module* name, not a driver name. And while modprobe will of course accept the name of the module to load, you can't use the module name to bind the driver to the kernel - you have to do *something* to derive the driver name from the module name. That's what this code does.

(I haven't seen an example where module name and driver name have more differences than just "-" vs "_", and it appears to be convention to "not do that", however it theoretically is completely doable - the driver name is just a string in a C file. I'd rather not have my code break at some random time in the future when somebody writes such a driver.)


c) All of this is conveniently ignoring the possibility of a VFIO
    variant driver that is statically linked in the kernel. The entire
    design of variant driver auto-detection is based on doing a lookup
    in modules.alias, and that only lists *loadable modules* (not
    drivers), so unless I'm missing something, it would be impossible
    to auto-detect a VFIO variant driver that was statically
    linked. This is beyond libvirt's ability to fix; the best that
    could be done would be to manually specify the driver name in the
    libvirt config, which I suppose is better than nothing :-)

Why? This seems wrong. Statically linked drivers appear in
/sys/bus/drivers/XX too

Yes, they appear in /sys/bus/drivers. But there isn't (as far as I know so far) a way to programmatically associate a statically linked variant driver with a device using the information available from the running kernel.

What I've been told until now is that the only way to find the proper VFIO variant driver for a particular device is to take the modalias string from that device's sysfs directory, and search for a line in the modules.alias file that starts with "alias vfio_pci:" and has the "most specific match" for the device's modalias - *that* is the name of the *module* that contains the best VFIO variant driver for the device.

If a driver is statically linked into the kernel, then it doesn't have a module, so it doesn't get a line in modules.alias (does it? I suppose I should try that to verify, but it doesn't make sense), and so there is no way to identify that driver as being a match for a device.

Or am I missing something?

In this case, as I've said in the commit logs somewhere, it's still possible to use a statically linked variant driver for a device, you just have to manually specify the driver name rather than having code that discovers it (which is what you have to do for *all* variant drivers until the final patch of this series anyway :-)
_______________________________________________
Devel mailing list -- devel@xxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux