On 16.04.20 11:33, Cornelia Huck wrote: > On Wed, 15 Apr 2020 08:08:24 +0200 > Harald Freudenberger <freude@xxxxxxxxxxxxx> wrote: > >> On 14.04.20 14:58, Cornelia Huck wrote: >>> On Tue, 7 Apr 2020 15:20:03 -0400 >>> Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote: >>>> + /* The non-default driver's module must be loaded */ >>>> + if (!try_module_get(drv->owner)) >>>> + return 0; >>> Is that really needed? I would have thought that the driver core's >>> klist usage would make sure that the callback would not be invoked for >>> drivers that are not registered anymore. Or am I missing a window? >> The try_module_get() and module_put() is a result of review feedback from >> my side. The ap bus core is static in the kernel whereas the >> vfio dd is a kernel module. So there may be a race condition between >> calling the callback function and removal of the vfio dd module. >> There is similar code in zcrypt_api which does the same for the zcrypt >> device drivers before using some variables or functions from the modules. >> Help me, it this is outdated code and there is no need to adjust the >> module reference counter any more, then I would be happy to remove >> this code :-) > I think the driver core already should keep us safe. A built-in bus > calling a driver in a module is a very common pattern, and I think > ->owner was introduced exactly for that case. > > Unless I'm really missing something obvious? Hm. I tested a similar code (see zcrypt_api.c where try_module_get() and module_put() is called surrounding use of functions related to the implementing driver. The driver module has a reference count of 0 when not used and can get removed - because refcount is 0 - at any time when there is nothing related to the driver pending. As soon as the driver is actually used the try_module_get(...driver.owner) increases the reference counter and makes it impossible to remove the module. After use the module_put() reduces the reference count. When I now remove the try_module_get() and module_put() calls and run this modified code I immediately face a crash when the module is removed during use. I see code in the kernel which does an initial try_module_get() on the driver to increase the reference count, for example when the driver registers. However, I see no clear way to remove such a driver module any more. I know I had a fight with a tester some years ago where he stated that it is a valid testcase to remove a device driver module 'during use of the driver'. So I'd like to have the try_module_get() and module_put() invokations in the ap bus code until you convince me there are other maybe better ways to make sure the driver and it's functions are available at the time of the call. Maybe we can discuss this offline if you wish :-)