On Thu, Dec 16 2021, Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > On Wed, 15 Dec 2021 13:51:02 +0100 > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > >> On Wed, Dec 15 2021, Thomas Huth <thuth@xxxxxxxxxx> wrote: >> >> > On 14/12/2021 22.55, Tony Krowiak wrote: >> >> >> >> >> >> On 12/13/21 11:11, Cornelia Huck wrote: >> >>> One possibility is simply blocking autoload of the module in userspace by >> >>> default, and only allow it to be loaded automatically when e.g. qemu-kvm >> >>> is installed on the system. This is obviously something that needs to be >> >>> decided by the distros. >> >>> >> >>> (kvm might actually be autoloaded already, so autoloading vfio-ap would >> >>> not really make it worse.) >> >> >> >> Of the vfio_ccw module is automatically loaded, then the kvm >> >> module will also get loaded. I startup up a RHEL8.3 system and >> >> sure enough, the vfio_ccw module is loaded along with the >> >> kvm, vfio and mdev modules. If this is true for all distros, then >> >> it wouldn't make much difference if the vfio_ap module is >> >> autoloaded too. >> > >> > I think I don't mind too much if we auto-load vfio-ap or not - but I think >> > we should make it consistent with vfio-ccw. So either auto-load both modules >> > (if the corresponding devices are available), or remove the >> > MODULE_DEVICE_TABLE() entries from both modules? >> >> I think we really need to take a step back and think about the purpose >> of adding a MODULE_DEVICE_TABLE()... basically, it declares which types >> of devices on a certain bus a driver supports, in a way that can be >> consumed by userspace (after file2alias.c worked on it). > > I did a quick search to locate where this semantic was codified. But > I didn't find the place neither Documentation/ nor in the header where > MODULE_DEVICE_TABLE is defined. This is rather ancient code pre-dating git; the mechanism as it is now came in during the 2.5 device model redesign and introduction of udev, IIRC. > >> >> Userspace typically uses this to match devices it is notified about to >> drivers that could possibly drive those devices. In general, the >> assumption is that you will want to have the drivers for your devices >> loaded. In some cases (drivers only used in special cases, like here), >> it might be a better idea to autoload the drivers only under certain >> circumstances (e.g. if you know you're going to run KVM guests). > > Does RHEL do this, or would RHEL do this out of the box? I.e. > would we end up preserving old behavior when this fix hits the distro, > or would the end user end up with kvm and vfio_ap loaded (out of the > box)? > > What would be the mechanism of choice to implement if kvm loaded and > APs present/hotplugged load vfio_ap, otherwise don't in the userspace? > > Sorry I'm not very familiar with this whole modules auto-loading > business, so I may be asking obvious questions. But a quick google > search did not help me. See /usr/lib/udev/rules.d/80-drivers.rules -- modules are automatically loaded based on the alias when a device is added. Autoload can be also be prevented; see the 'blacklist' directive in man 5 modprobe.d, which will basically prevent autoloading based on the module-based aliases (you will have to load the modules directly instead.) This is likely to be the case on other distros as well. An option would be to e.g. add a 'blacklist' statement for vfio-ap, and have the qemu-kvm package install a udev rule that loads vfio-ap if any supported ap devices are added. > >> >> My main point, however, is that we're talking about policy here: whether >> a potentially useful driver should be loaded or not is a decision that >> should be made by userspace. Not providing a MODULE_DEVICE_TABLE does >> not look like the right solution, as it deprives userspace of the >> information to autoload the driver, if it actually wants to do so. >> > > I'm sympathetic to this reading of the situation, but I'm not sure > it is as black and white as stated. > > I think the current state of affairs in the vfio_ap module is clearly a > bug. Yes, the current code is simply wrong. > > One can argue that not auto-loading vfio_ap and kvm per default out of > the box is not a bug. I mean the tooling (chzdev) seems to do fine > without this and just assuming that both kvm and vfio_ap will be needed > just because we have APs seems wrong. > > One of the more important guiding principles of Linux kernel development > is no userspace regressions. And I think suddenly getting vfio_ap and kvm > loaded just because we have AP devices can be thought of as a regression. I disagree, that does not have anything to do with regressions. > > So I'm sympathetic to Harald's view as well. > > Of course there is the solution that the distros should really make sure > the old behavior is preserved, or some smart behavior is introduced. But > regarding smart, I believe "if you have devices that are configured for > vfio_ap pass-through (with chzdev), then the vfio_ap module should get > loaded" is pretty much as smart as it gets. So blacklisting the module > by default in the distros looks like a viable option. If that is what > we want, we should probably ask the distros, because I don't think > it is just obviously their job to figure out that they have to do so. > I don't think the *zdev tools are the right place to control this in general, I'd prefer to do it with generic tools (see above). But the most important part is to configure this in userspace (and just fix the kernel). Nothing prevents anyone from doing their own thing :) > Disclaimer: I might be wrong about the current behavior, I didn't verify > my claims > > Also what does vfio-pci do? As far as I can tell vfio-pci does not > participate in module auto loading just because there are pci devices. > The have some smart override I don't quite understand: > https://patchwork.ozlabs.org/project/linux-pci/patch/20210826103912.128972-11-yishaih@xxxxxxxxxx/ > Before, I don't think they had a MODULE_DEVICE_TABLE: > https://elixir.bootlin.com/linux/v5.8.18/source/drivers/vfio/pci/vfio_pci.c I don't think it makes sense to look at pci for inspiration here; they have a myriad of different device types, while ap only has a few (and probably not that many different ones on a given system), and css only has one that really matters.