On Tue, Apr 18, 2023 at 01:46:36PM -0700, Luis Chamberlain wrote: > This adds debugging support to the kernel module auto-loader to > easily detect and deal with duplicate module requests. To aid > with possible bootup failure issues it will supress the waste > in virtual memory when races happen before userspace loads a > module and the kernel is still issuing requests for the same > module. > > Folks debugging virtual memory abuse on bootup can and should > enable this to see what WARN()s come on, to see if module > auto-loading is to blame for their woes. You get 72 columns for changelog text, so you can use it :) > > Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> > --- > > Changes on this patch since the last RFC: > > o dropped the kernel_read*() patch from this series moving to > punt the issues as a udev issue now that we have proof auto-loading > is not the issue > o some spell checks > > kernel/module/Kconfig | 40 +++++++ > kernel/module/Makefile | 1 + > kernel/module/dups.c | 234 +++++++++++++++++++++++++++++++++++++++ > kernel/module/internal.h | 15 +++ > kernel/module/kmod.c | 23 +++- > 5 files changed, 309 insertions(+), 4 deletions(-) > create mode 100644 kernel/module/dups.c > > diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig > index e6df183e2c80..cc146ef4a6ac 100644 > --- a/kernel/module/Kconfig > +++ b/kernel/module/Kconfig > @@ -59,6 +59,46 @@ config MODULE_STATS > > If unsure, say N. > > +config MODULE_AUTOLOAD_SUPRESS_DUPS MODULE_DEBUG_DUPLICATE perhaps? It has nothing to do with autoloading (other than that is what userspace is doing) and you aren't suppressing anything except throwing up warnings, right? > + bool "Debug duplicate modules with auto-loading" > + help > + Module autoloading allows in-kernel code to request modules through > + the *request_module*() API calls. This in turn just calls userspace > + modprobe. Although modprobe checks to see if a module is already > + loaded before trying to load a module there is a small time window in > + which multiple duplicate requests can end up in userspace and multiple > + modprobe calls race calling finit_module() around the same time for > + duplicate modules. The finit_module() system call can consume in the > + worst case more than twice the respective module size in virtual > + memory for each duplicate module requests. Although duplicate module > + requests are non-fatal virtual memory is a limited resource and each > + duplicate module request ends up just wasting virtual memory. It's not "wasted", as it is returned when the module is determined to be a duplicate. Otherwise everyone will want this enabled as they think it will actually save memory. > + This debugging facility will create WARN() splats for duplicate module > + requests to help identify if module auto-loading is the culprit to your > + woes. Since virtual memory abuse caused by duplicate module requests > + could render a system unusable this functionality will also suppresses > + the waste in virtual memory caused by duplicate requests by sharing > + races in requests for the same module to a single unified request. > + Once a non-wait request_module() call completes a module should be > + loaded and modprobe should simply not allow new finit_module() calls. > + > + Enable this functionality to try to debug virtual memory abuse during > + boot on systems and identify if the abuse was due to module > + auto-loading. > + > + If the first module request used request_module_nowait() we cannot > + use that as the anchor to wait for duplicate module requests, since > + users of request_module() do want a proper return value. If a call > + for the same module happened earlier with request_module() though, > + then a duplicate request_module_nowait() would be detected. > + > + You want to enable this if you want to debug and see if duplicate > + module auto-loading might be causing virtual memory abuse during > + bootup. A kernel trace will be provided for each duplicate request. > + > + Disable this if you are on production. "on production"? That does not translate well, how about: Only enable this for debugging system functionality, never have it enabled on real systems. or something like that? > + > endif # MODULE_DEBUG > > config MODULE_FORCE_LOAD > diff --git a/kernel/module/Makefile b/kernel/module/Makefile > index 52340bce497e..e8b121ac39cf 100644 > --- a/kernel/module/Makefile > +++ b/kernel/module/Makefile > @@ -10,6 +10,7 @@ KCOV_INSTRUMENT_module.o := n > obj-y += main.o > obj-y += strict_rwx.o > obj-y += kmod.o > +obj-$(CONFIG_MODULE_AUTOLOAD_SUPRESS_DUPS) += dups.o > obj-$(CONFIG_MODULE_DECOMPRESS) += decompress.o > obj-$(CONFIG_MODULE_SIG) += signing.o > obj-$(CONFIG_LIVEPATCH) += livepatch.o > diff --git a/kernel/module/dups.c b/kernel/module/dups.c > new file mode 100644 > index 000000000000..903ab7c7e8f4 > --- /dev/null > +++ b/kernel/module/dups.c > @@ -0,0 +1,234 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * kmod dups - the kernel module autoloader duplicate suppressor > + * > + * Copyright (C) 2023 Luis Chamberlain <mcgrof@xxxxxxxxxx> > + */ > + > +#include <linux/module.h> > +#include <linux/sched.h> > +#include <linux/sched/task.h> > +#include <linux/binfmts.h> > +#include <linux/syscalls.h> > +#include <linux/unistd.h> > +#include <linux/kmod.h> > +#include <linux/slab.h> > +#include <linux/completion.h> > +#include <linux/cred.h> > +#include <linux/file.h> > +#include <linux/fdtable.h> > +#include <linux/workqueue.h> > +#include <linux/security.h> > +#include <linux/mount.h> > +#include <linux/kernel.h> > +#include <linux/init.h> > +#include <linux/resource.h> > +#include <linux/notifier.h> > +#include <linux/suspend.h> > +#include <linux/rwsem.h> > +#include <linux/ptrace.h> > +#include <linux/async.h> > +#include <linux/uaccess.h> > + > +DEFINE_MUTEX(kmod_dup_mutex); static? > +static LIST_HEAD(dup_kmod_reqs); > + > +struct kmod_dup_req { > + struct list_head list; > + char name[MODULE_NAME_LEN]; > + struct completion first_req_done; > + struct work_struct complete_work; > + struct delayed_work delete_work; > + int dup_ret; > +}; > + > +static struct kmod_dup_req *kmod_dup_request_lookup(char *module_name) Lock requirements? You should document that here. > +{ > + struct kmod_dup_req *kmod_req; > + > + list_for_each_entry_rcu(kmod_req, &dup_kmod_reqs, list, > + lockdep_is_held(&kmod_dup_mutex)) { > + if (strlen(kmod_req->name) == strlen(module_name) && > + !memcmp(kmod_req->name, module_name, strlen(module_name))) { > + return kmod_req; > + } > + } > + > + return NULL; > +} > + > +static void kmod_dup_request_delete(struct work_struct *work) > +{ > + struct kmod_dup_req *kmod_req; > + kmod_req = container_of(to_delayed_work(work), struct kmod_dup_req, delete_work); > + > + /* > + * The typical situation is a module successully loaded. In that > + * situation the module will be present already in userspace. If > + * new requests come in after that, userspace will already know the > + * module is loaded so will just return 0 right away. There is still > + * a small chance right after we delete this entry new request_module() > + * calls may happen after that, they can happen. These heuristics > + * are to protect finit_module() abuse for auto-loading, if modules > + * are still tryign to auto-load even if a module is already loaded, > + * that's on them, and those inneficiencies should not be fixed by > + * kmod. The inneficies there are a call to modprobe and modprobe > + * just returning 0. > + */ > + mutex_lock(&kmod_dup_mutex); > + list_del_rcu(&kmod_req->list); > + synchronize_rcu(); > + mutex_unlock(&kmod_dup_mutex); > + kfree(kmod_req); > +} > + > +static void kmod_dup_request_complete(struct work_struct *work) > +{ > + struct kmod_dup_req *kmod_req; > + > + kmod_req = container_of(work, struct kmod_dup_req, complete_work); > + > + /* > + * This will ensure that the kernel will let all the waiters get > + * informed its time to check the return value. It's time to > + * go home. > + */ > + complete_all(&kmod_req->first_req_done); > + > + /* > + * Now that we have allowed prior request_module() calls to go on > + * with life, let's schedule deleting this entry. We don't have > + * to do it right away, but we *eventually* want to do it so to not > + * let this linger forever as this is just a boot optimization for > + * possible abuses of vmalloc() incurred by finit_module() thrashing. > + */ > + queue_delayed_work(system_wq, &kmod_req->delete_work, 60 * HZ); > +} > + > +bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret) > +{ > + struct kmod_dup_req *kmod_req, *new_kmod_req; > + int ret; > + > + /* > + * Pre-allocate the entry in case we have to use it later > + * to avoid contention with the mutex. > + */ > + new_kmod_req = kzalloc(sizeof(*new_kmod_req), GFP_KERNEL); > + if (!new_kmod_req) > + return false; > + > + memcpy(new_kmod_req->name, module_name, strlen(module_name)); > + INIT_WORK(&new_kmod_req->complete_work, kmod_dup_request_complete); > + INIT_DELAYED_WORK(&new_kmod_req->delete_work, kmod_dup_request_delete); > + init_completion(&new_kmod_req->first_req_done); > + > + mutex_lock(&kmod_dup_mutex); > + > + kmod_req = kmod_dup_request_lookup(module_name); > + if (!kmod_req) { > + /* > + * If the first request that came through for a module > + * was with request_module_nowait() we cannot wait for it > + * and share its return value with other users which may > + * have used request_module() and need a proper return value > + * so just skip using them as an anchor. > + * > + * If a prior request to this one came through with > + * request_module() though, then a request_module_nowait() > + * would benefit from duplicate detection. > + */ > + if (!wait) { > + kfree(new_kmod_req); > + pr_warn("New request_module_nowait() for %s -- cannot track duplicates for this request\n", module_name); pr_dbg() as this is debugging? And did you set your pr_fmt value to make it obvious where these messages are coming from? > + mutex_unlock(&kmod_dup_mutex); > + return false; > + } > + > + /* > + * There was no duplicate, just add the request so we can > + * keep tab on duplicates later. > + */ > + pr_info("New request_module() for %s\n", module_name); Why not pr_dbg()? > + list_add_rcu(&new_kmod_req->list, &dup_kmod_reqs); > + mutex_unlock(&kmod_dup_mutex); > + return false; > + } > + mutex_unlock(&kmod_dup_mutex); > + > + /* We are dealing with a duplicate request now */ > + No blank line needed. > + kfree(new_kmod_req); > + > + /* > + * To fix these try to use try_then_request_module() instead as that > + * will check if the component you are looking for is present or not. > + * You could also just queue a single request to load the module once, > + * instead of having each and everything you need try to request for > + * the module. > + * > + * Duplicate request_module() calls can cause quite a bit of wasted > + * vmalloc() space when racing with userspace. > + */ > + WARN(1, "module-autoload: duplicate request for module %s\n", module_name); Remember, this will trigger syzbot and any system that has panic-on-warn, so be prepared for that mess. Especially the syzbot reports, that's going to cause lots of issues for you if you don't tell them to always disable this option. thanks, greg k-h