On Fri, Jan 18, 2013 at 2:12 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > >>From 4983f3b51e18d008956dd113e0ea2f252774cefc Mon Sep 17 00:00:00 2001 > From: Tejun Heo <tj@xxxxxxxxxx> > Date: Fri, 18 Jan 2013 14:05:57 -0800 > > Synchronous requet_module() from an async worker can lead to deadlock > because module init path may invoke async_synchronize_full(). The > async worker waits for request_module() to complete and the module > loading waits for the async task to finish. This bug happened in the > block layer because of default elevator auto-loading. > > Block layer has been updated not to do default elevator auto-loading > and it has been decided to disallow synchronous request_module() from > async workers. > > Trigger WARN_ON_ONCE() on synchronous request_module() from async > workers. > > For more details, please refer to the following thread. > > http://thread.gmane.org/gmane.linux.kernel/1420814 > > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> > Reported-by: Alex Riesen <raa.lkml@xxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> > --- > kernel/kmod.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/kernel/kmod.c b/kernel/kmod.c > index 1c317e3..ecd42b4 100644 > --- a/kernel/kmod.c > +++ b/kernel/kmod.c > @@ -38,6 +38,7 @@ > #include <linux/suspend.h> > #include <linux/rwsem.h> > #include <linux/ptrace.h> > +#include <linux/async.h> > #include <asm/uaccess.h> > > #include <trace/events/module.h> > @@ -130,6 +131,14 @@ int __request_module(bool wait, const char *fmt, ...) > #define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */ > static int kmod_loop_msg; > > + /* > + * We don't allow synchronous module loading from async. Module > + * init may invoke async_synchronize_full() which will end up > + * waiting for this task which already is waiting for the module > + * loading to complete, leading to a deadlock. > + */ > + WARN_ON_ONCE(wait && current_is_async()); > + If a builtin driver does async probing even before we get to being able to load modules, this causes a spurious warning splat. Here's a report by Marek [1]. I tried taking a stab at not warning at least for drivers that do async probing before the initcalls are done, but then I got confused [2] trying to understand when is the earliest point in the bootup that request_module() can succeed. If someone can clarify my confusion, I can try avoiding this warning for calls to request_module() before we can load any modules. Any other ideas for either making this warning way less trigger happy about false positives? [1] - https://lore.kernel.org/lkml/d5796286-ec24-511a-5910-5673f8ea8b10@xxxxxxxxxxx/ [2] - https://lore.kernel.org/lkml/CAGETcx-MHwex8tHLB1d71MAP01-3OPDZSNCUBb3iT+BtrugJmQ@xxxxxxxxxxxxxx/ Another question (pardon my ignorance) is whether we need to async_synchronize_full() at the end of do_init_module() or if we can limit it to a smaller domain? Looking at this history, I see that this call was added by Linus in this commit d6de2c80e9d7 ("async: Fix module loading async-work regression"). Are we doing the blanket async_synchronize_full() only because we are not keeping proper track of the async domains? And if so, then what if we have a sync domain per module and any uses of async_schedule*() triggered by that module is tied to the module's async domain? Then we'd only need to sync that module's domain and we won't hit any deadlock issues. Grepping for async_schedule*() calls, I see only about 30 instances. At a glance, it looks like most cases are: 1. Have a device/driver from which we can find the related module and tie the async_scheduler() to that domain. 2. Just direct async_schedule*() calls from module_init() -- we can just directly tie it to the module's domain. 3. Other? Is this idea worth pursuing? Or am I going in a completely wrong direction? Btw, I did see Linus's suggestion in one of the emails in this thread (?) about just doing a synchronize full on device open. That'd seem like it would work too, but I'm afraid to touch any file open code path because I expect that to be a hot path. -Saravana