On 2024-01-30 21:47, Luis Chamberlain wrote: > On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote: >> The current implementation of try_module_get() requires the module to >> exist and be live as a precondition. While this may seem intuitive at >> first glance, enforcing the precondition can be tricky, considering that >> modules can be unloaded at any time if not previously taken. For >> instance, the caller could be preempted just before calling >> try_module_get(), and while preempted, the module could be unloaded and >> freed. More subtly, the module could also be unloaded at any point while >> executing try_module_get() before incrementing the refount with >> atomic_inc_not_zero(). >> >> Neglecting the precondition that the module must exist and be live can >> cause unexpected race conditions that can lead to crashes. However, >> ensuring that the precondition is met may require additional locking >> that increases the complexity of the code and can make it more >> error-prone. >> >> This patch adds a slower yet safer implementation of try_module_get() >> that checks if the module is valid by looking into the mod_tree before >> taking the module's refcount. This new function can be safely called on >> stale and invalid module pointers, relieving developers from the burden >> of ensuring that the module exists and is live before attempting to take >> it. >> >> The tree lookup and refcount increment are executed after taking the >> module_mutex to prevent the module from being unloaded after looking up >> the tree. >> >> Signed-off-by: Marco Pagani <marpagan@xxxxxxxxxx> > > It very much sounds like there is a desire to have this but without a > user, there is no justification. I was working on a set of patches to fix an issue in the fpga subsystem when I came across your commit 557aafac1153 ("kernel/module: add documentation for try_module_get()") that made me realize we also had a safety problem. To solve this problem for the fpga manager, we had to add a mutex to ensure the low-level module still exists before calling try_module_get(). However, having a safer version of try_module_get() would have simplified the code and made it more robust against changes. https://lore.kernel.org/linux-fpga/20240111160242.149265-1-marpagan@xxxxxxxxxx/ I suspect there may be other cases where try_module_get() is inadvertently called without ensuring that the module still exists that may benefit from a safer implementation. >> +bool try_module_get_safe(struct module *module) >> +{ >> + struct module *mod; >> + bool ret = true; >> + >> + if (!module) >> + goto out; >> + >> + mutex_lock(&module_mutex); > > If a user comes around then this should be mutex_lock_interruptible(), > and add might_sleep() Would it be okay to return false if it gets interrupted, or should I change the return type to int to propagate -EINTR? My concern with changing the signature is that it would be less straightforward to use the function in place of try_module_get(). >> + >> + /* >> + * Check if the address points to a valid live module and take >> + * the refcount only if it points to the module struct. >> + */ >> + mod = __module_address((unsigned long)module); >> + if (mod && mod == module && module_is_live(mod)) >> + __module_get(mod); >> + else >> + ret = false; >> + >> + mutex_unlock(&module_mutex); >> + >> +out: >> + return ret; >> +} >> +EXPORT_SYMBOL(try_module_get_safe); > > And EXPORT_SYMBOL_GPL() would need to be used. Okay, I initially used EXPORT_SYMBOL() to be compatible with try_module_get(). > > I'd also expect selftests to be expanded for this case, but again, > without a user, this is just trying to resolve a problem which does not > exist. I can add selftests in the next versions. Thanks, Marco