Re: Fixing gave up waiting for init of module libcrc32c.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 20 Mar 2010 10:59:59 pm Herbert Xu wrote:
> On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
> >
> > I hear what you're saying Herbert, but thinking about this a bit I
> > really think we should make this situation work instead of fail.
> 
> I think the initial report perhaps painted this in a slight
> different fashion than what it really is.  The code that was
> looping in module.c is not trying to load libcrc32c, but rather
> it is trying to get a reference on the already-loaded libcrc32c
> module.
> 
> AFAICS the only way to make it "work" would be to reload the
> module in question when we can't get a reference on it.  But
> that would entail recursively loading a module during the process
> of loading another module.
> 
> Rusty can chime in on whether this is doable.

Not really.  The kernel code is pretty simple: if we can't find a symbol
we need, fail the load.  To avoid nasty spurious races, we *do* wait if the
symbol we need is in a module which is still initializing.  But we time out
after 30 seconds to avoid a module trainwreck.

The real fix here is to drop the lock, like Brandon suggested, but we need
to do it more carefully: when we re-acquire the lock we need to re-lookup
the symbol in case the module has vanished or changed.

Brandon, I can't see how libcrc32c's module_init calls crypto_alloc_shash,
but the problem is reproducible with simple example modules.  Does it fix
your problem?

Tim: note that use_module() return value changes.

Thanks,
Rusty.

module: drop the lock while waiting for module to complete initialization.

This fixes "gave up waiting for init of module libcrc32c." which
happened at boot time due to multiple parallel module loads.

The problem was a deadlock: we wait for a module to finish
initializing, but we keep the module_lock mutex so it can't complete.
In particular, this could reasonably happen if libcrc32c does a
request_module() in its initialization routine.

So we change use_module() to return an errno rather than a bool, and if
it's -EBUSY we drop the lock and wait in the caller, then reaquire the
lock.

Reported-by: Brandon Philips <brandon@xxxxxxxx>
Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -510,33 +510,25 @@ int use_module(struct module *a, struct 
 	struct module_use *use;
 	int no_warn, err;
 
-	if (b == NULL || already_uses(a, b)) return 1;
+	if (b == NULL || already_uses(a, b)) return 0;
 
 	/* If we're interrupted or time out, we fail. */
-	if (wait_event_interruptible_timeout(
-		    module_wq, (err = strong_try_module_get(b)) != -EBUSY,
-		    30 * HZ) <= 0) {
-		printk("%s: gave up waiting for init of module %s.\n",
-		       a->name, b->name);
-		return 0;
-	}
-
-	/* If strong_try_module_get() returned a different error, we fail. */
+	err = strong_try_module_get(b);
 	if (err)
-		return 0;
+		return err;
 
 	DEBUGP("Allocating new usage for %s.\n", a->name);
 	use = kmalloc(sizeof(*use), GFP_ATOMIC);
 	if (!use) {
 		printk("%s: out of memory loading\n", a->name);
 		module_put(b);
-		return 0;
+		return -ENOMEM;
 	}
 
 	use->module_which_uses = a;
 	list_add(&use->list, &b->modules_which_use_me);
 	no_warn = sysfs_create_link(b->holders_dir, &a->mkobj.kobj, a->name);
-	return 1;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(use_module);
 
@@ -823,7 +815,7 @@ static inline void module_unload_free(st
 
 int use_module(struct module *a, struct module *b)
 {
-	return strong_try_module_get(b) == 0;
+	return strong_try_module_get(b);
 }
 EXPORT_SYMBOL_GPL(use_module);
 
@@ -994,17 +986,39 @@ static const struct kernel_symbol *resol
 	struct module *owner;
 	const struct kernel_symbol *sym;
 	const unsigned long *crc;
+	DEFINE_WAIT(wait);
+	int err;
+	long timeleft = 30 * HZ;
 
+again:
 	sym = find_symbol(name, &owner, &crc,
 			  !(mod->taints & (1 << TAINT_PROPRIETARY_MODULE)), true);
-	/* use_module can fail due to OOM,
-	   or module initialization or unloading */
-	if (sym) {
-		if (!check_version(sechdrs, versindex, name, mod, crc, owner)
-		    || !use_module(mod, owner))
-			sym = NULL;
+	if (!sym)
+		return NULL;
+
+	if (!check_version(sechdrs, versindex, name, mod, crc, owner))
+		return NULL;
+
+	prepare_to_wait(&module_wq, &wait, TASK_INTERRUPTIBLE);
+	err = use_module(mod, owner);
+	if (likely(!err) || err != -EBUSY || signal_pending(current)) {
+		finish_wait(&module_wq, &wait);
+		return err ? NULL : sym;
 	}
-	return sym;
+
+	/* Module is still loading.  Drop lock and wait. */
+	mutex_unlock(&module_mutex);
+	timeleft = schedule_timeout(timeleft);
+	mutex_lock(&module_mutex);
+	finish_wait(&module_wq, &wait);
+
+	/* Module might be gone entirely, or replaced.  Re-lookup. */
+	if (timeleft)
+		goto again;
+
+	printk("%s: gave up waiting for init of module %s.\n",
+	       mod->name, owner->name);
+	return NULL;
 }
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux