Patch "module: warn about excessively long module waits" has been added to the 6.10-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 11 Aug 2024 21:11:18 -0400

This is a note to let you know that I've just added the patch titled

    module: warn about excessively long module waits

to the 6.10-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     module-warn-about-excessively-long-module-waits.patch
and it can be found in the queue-6.10 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 1554ec968d2bdafc81b5fc52139f71b9bb305d15
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date:   Thu Aug 8 12:29:40 2024 -0700

    module: warn about excessively long module waits
    
    [ Upstream commit cb5b81bc9a448f8db817566f60f92e2ea788ea0f ]
    
    Russell King reported that the arm cbc(aes) crypto module hangs when
    loaded, and Herbert Xu bisected it to commit 9b9879fc0327 ("modules:
    catch concurrent module loads, treat them as idempotent"), and noted:
    
     "So what's happening here is that the first modprobe tries to load a
      fallback CBC implementation, in doing so it triggers a load of the
      exact same module due to module aliases.
    
      IOW we're loading aes-arm-bs which provides cbc(aes). However, this
      needs a fallback of cbc(aes) to operate, which is made out of the
      generic cbc module + any implementation of aes, or ecb(aes). The
      latter happens to also be provided by aes-arm-cb so that's why it
      tries to load the same module again"
    
    So loading the aes-arm-bs module ends up wanting to recursively load
    itself, and the recursive load then ends up waiting for the original
    module load to complete.
    
    This is a regression, in that it used to be that we just tried to load
    the module multiple times, and then as we went on to install it the
    second time we would instead just error out because the module name
    already existed.
    
    That is actually also exactly what the original "catch concurrent loads"
    patch did in commit 9828ed3f695a ("module: error out early on concurrent
    load of the same module file"), but it turns out that it ends up being
    racy, in that erroring out before the module has been fully initialized
    will cause failures in dependent module loading.
    
    See commit ac2263b588df (which was the revert of that "error out early")
    commit for details about why erroring out before the module has been
    initialized is actually fundamentally racy.
    
    Now, for the actual recursive module load (as opposed to just
    concurrently loading the same module twice), the race is not an issue.
    
    At the same time it's hard for the kernel to see that this is recursion,
    because the module load is always done from a usermode helper, so the
    recursion is not some simple callchain within the kernel.
    
    End result: this is not the real fix, but this at least adds a warning
    for the situation (admittedly much too late for all the debugging pain
    that Russell and Herbert went through) and if we can come to a
    resolution on how to detect the recursion properly, this re-organizes
    the code to make that easier.
    
    Link: https://lore.kernel.org/all/ZrFHLqvFqhzykuYw@xxxxxxxxxxxxxxxxxxxxx/
    Reported-by: Russell King <linux@xxxxxxxxxxxxxxx>
    Debugged-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
    Stable-dep-of: 2124d84db293 ("module: make waiting for a concurrent module loader interruptible")
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/module/main.c b/kernel/module/main.c
index d18a94b973e10..7445d27ce3cdc 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3180,15 +3180,28 @@ static int idempotent_init_module(struct file *f, const char __user * uargs, int
 	if (!f || !(f->f_mode & FMODE_READ))
 		return -EBADF;
 
-	/* See if somebody else is doing the operation? */
-	if (idempotent(&idem, file_inode(f))) {
-		wait_for_completion(&idem.complete);
-		return idem.ret;
+	/* Are we the winners of the race and get to do this? */
+	if (!idempotent(&idem, file_inode(f))) {
+		int ret = init_module_from_file(f, uargs, flags);
+		return idempotent_complete(&idem, ret);
 	}
 
-	/* Otherwise, we'll do it and complete others */
-	return idempotent_complete(&idem,
-		init_module_from_file(f, uargs, flags));
+	/*
+	 * Somebody else won the race and is loading the module.
+	 *
+	 * We have to wait for it forever, since our 'idem' is
+	 * on the stack and the list entry stays there until
+	 * completed (but we could fix it under the idem_lock)
+	 *
+	 * It's also unclear what a real timeout might be,
+	 * but we could maybe at least make this killable
+	 * and remove the idem entry in that case?
+	 */
+	for (;;) {
+		if (wait_for_completion_timeout(&idem.complete, 10*HZ))
+			return idem.ret;
+		pr_warn_once("module '%pD' taking a long time to load", f);
+	}
 }
 
 SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)