On Wed, Feb 26, 2025 at 05:02:23PM -0800, Greg KH wrote: > On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote: > > The way misc device works you can't unload the module until all the > > FDs are closed and the misc code directly handles races with opening > > new FDs while modules are unloading. It is quite a different scheme > > than discussed in this thread. > > And I would argue that is it the _right_ scheme to be following overall > here. Removing modules with in-flight devices/drivers is to me is odd, > and only good for developers doing work, not for real systems, right? There are two issues and I've found these discussions get confused about two interrelated things: 1) Module lifetime and when modules are refcounted 2) How does device_driver remove() work, especially with hot unplug and /sys/../unbind while the module is *still loaded*. Noting, very explicitly, that you can unbind a device_driver without unloading the module. #1 should be strictly based around the needs of function pointers in the system. Ie stuff like ".owner = THIS_MODULE". #2 is challenging when the driver has a file descriptor. AFAIK there are only two broad choices: a) wait for all FDs to close in remove() (boo!) b) leave the FDs open but disable them and complete remove(). eg return -ENODEV to all system calls I think the kernel community has a strong preference for (b), but rdma had started with (a) long ago. So we fixed it to (b), netdev does (b), so do alot of places because (a) is, frankly, awful. Now.. how does that relate to module unbinding? The drivers are unbound now because we properly support hotunplug via (b). So when is it OK to allow a module with no bound drivers to remove while a zombie FD is still open? That largely revolves around who owns the struct file_operations. For misc_dev the driver module would own it, so it is impossible to unload the driver module even if the device driver was hot unplugged/unbound. For a subsystem, like rdma, the subsystem can own the file_operations. Now to allow the driver module to be unloaded we "simply" require the subsystem to fence all driver callbacks during device driver remove and subsystem unregister. ie if the subsystem knows it no longer can call the driver then it no longer needs a refcount on the driver module. This fence was necessary anyhow for RDMA because drivers had the pre-existing assumption that unregister was fencing all driver callbacks by waiting for the FDs to close. Drivers did not handle UAF races with something like pci_iounmp() and their concurrent driver callback threads. Once the fence was built it was straightforward to also allow driver module unload since the core code has NULL'd its copy of all the driver function pointers during unregister. Further, I'd argue this is the best model for subsystems to follow. Allowing driver code to continue to run after subsystem unregister forces the driver to deal with UAF removal races. This is too hard for drivers to implement correctly, and prevents unloading the driver module after the drivers have been unbound. Why do people care? Aside from obvious hot-unplug cases, like physical PCI hot plug on high-avaibility servers hated (a), there was a strong desire from folks running software HA schemes to be able to upgrade the driver module with minimal hits. They want to leave the application running and it is able to fast-recover when the FD becomes -ENODEV by opening a new one and keeping most of their internal state alive. > What is the requirement that means that you have to do this for function > pointers? I'm just pointing out that function pointers are not guaranteed to be valid forever in the linux model. Every function pointer is somehow being protected by a lifecycle that links back to the module lifecycle. Most of the time a driver author can ignore function pointer lifecycle analysis, but not always.. Jason