On Wed, May 24, 2023 at 10:49:13AM -0700, Luis Chamberlain wrote: > On Wed, May 24, 2023 at 04:15:31PM +0800, Yujie Liu wrote: > > Hi Luis, > > > > On Tue, May 23, 2023 at 10:16:17PM -0700, Luis Chamberlain wrote: > > > On Mon, May 22, 2023 at 03:51:59PM +0800, kernel test robot wrote: > > > > Hello, > > > > > > > > kernel test robot noticed "WARNING:at_kernel/module/dups.c:#kmod_dup_request_exists_wait" on: > > > > > > > > commit: 8660484ed1cf3261e89e0bad94c6395597e87599 ("module: add debugging auto-load duplicate module support") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > [test failed on linux-next/master 798d276b39e984345d52b933a900a71fa0815928] > > > > > > > > in testcase: boot > > > > > > > > compiler: gcc-11 > > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > > > > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > > > > > We are not sure if this is expected as this patch is for debugging > > > > duplicate module requests issues, so we are sending this report for > > > > your information. Thanks. > > > > > > > > > > > > If you fix the issue, kindly add following tag > > > > | Reported-by: kernel test robot <yujie.liu@xxxxxxxxx> > > > > | Closes: https://lore.kernel.org/oe-lkp/202305221508.753c7b78-yujie.liu@xxxxxxxxx > > > > > > > > > > > > [ 48.705567][ T1] ------------[ cut here ]------------ > > > > [ 48.706519][ T1] module-autoload: duplicate request for module intel_qat > > > > [ 48.707866][ T1] WARNING: CPU: 0 PID: 1 at kernel/module/dups.c:183 kmod_dup_request_exists_wait (kernel/module/dups.c:183 (discriminator 1)) > > > > > > This is a real issue since CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE was enabled. > > > If you enable CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE and these warnings > > > come up a bette detailed report would be better. In this case the goal > > > was to capture races of request_module() and so the idea for developers > > > is to identify the module that caused this, in this case intel_qat, and > > > then see who called the request_module() for it. They could try things > > > like try_module_get(), but could also ensure that multiple requests > > > won't be done in the code by using locking schemes or whatever. Only > > > one request should be issued. > > > > > > The trace below would show possible users but I don't even see > > > drivers/crypto/qat/qat_c3xxx/adf_drv.c in my kernel tree. > > > > > > If you don't to deal with this reporting you can just disable this > > > config option. > > > > Thanks a lot for the information. Does this indicate that there is > > indeed a multiple requests issue in that crypto driver and we could > > report to its author/owner? Thanks. > > Not for the crypto driver, there are multiple requests for > something like: > > request_module("intel_qat") > > the trace shows who it could be. A git grep shows: > > git grep request_module | grep intel_qat > drivers/crypto/intel/qat/qat_c3xxx/adf_drv.c: request_module("intel_qat"); > drivers/crypto/intel/qat/qat_c3xxxvf/adf_drv.c: request_module("intel_qat"); > drivers/crypto/intel/qat/qat_c62x/adf_drv.c: request_module("intel_qat"); > drivers/crypto/intel/qat/qat_c62xvf/adf_drv.c: request_module("intel_qat"); > drivers/crypto/intel/qat/qat_dh895xcc/adf_drv.c: request_module("intel_qat"); > drivers/crypto/intel/qat/qat_dh895xccvf/adf_drv.c: request_module("intel_qat"); > > And so there should be one request issued, ideally. > > Nothing bad happens of multiple request are sent, however duplicate > requests means that you can end up userspace trying to load multiple > requests just for one to be freed. > > This is not a serious bug. > > Soon, I'll implement a solution so that even if duplicate requests get > to userspace it won't trigger a duplicate load in-kernel, and therefore > duplicate module loads won't have to free the extra ones. At that point > this becomes less of an issue. > > request_module() just calls to modprobe, and modprobe is supposed to > check if a module is already loaded before trying finit_module(). If > two duplicate requests end up calling modprobe though there is a small > race that two concurrent finit_module() calls will happen as userspace > would not have seen one load request going in before. The issue is that > finit_module() uses vmalloc space multiple times for the size of the > module, even if the module is a duplicate. One vmalloc space is for > kernel_read*() call, the other is for the copy we want to keep around > in kernel, the final copy. If module deeecompression is used another > vmalloc is used. So in the worst case 3 vmallocs. > > A duplicate request could end up being failed only to find out a module > is already loaded, and so about 2-3 times the size of vmalloc space of > a module size could be free'd just due to a duplicate. > > We could avoid memory pressure proactively by trying to issue only one > request out. > > CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS was added to detect these possible > duplicates when they were triggered by module-autoloading, through the > request_module() API, directly from the kernel. It turns out that there > aren't many of these duplicates. That was the point to the exercise. > To proove the kernel likely wasn't the one causing tons of duplicates. > > Because if you enable CONFIG_MODULE_STATS and then cat /sys/kernel/debug/modules/stats > you will see tons of duplicates. This gets worse by the number of CPUs > you have. Seeing little reports of CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS > proves the issue is elsewhere and at this point I'm certain this is > coming from udev kmod library usage and it forking and not being able > to detect the same context for the kmod library for loading modules, > and so duplicate module request go out for tons of modules for each CPU. > > See commit 064f4536d13939 ("module: avoid allocation if module is > already present and ready") for a graph of this. > > That commit also prevents having to do one vmalloc allocation if the > module is already present. I'm going to work now on one which prevents > the first, and so duplicates requests coming from either userspace or > the module-autoloader should be harmless. But that is not merged yet, > and I still have to post the patch. > > So in the meantime CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE should only > be enabled so to try to help those who wish to be proactive and want > to enhance the kernel module auto-loader so to only issue one > request_module() for a module needed by the kernel. > > Technically CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS also implements a > way to unify requests for the same module into one, and they all > share the same result, but this works only for non-async calls to > request_module(). That config option is a debug mechanism right now, > but we could make it default later. > > > Not sure if I undertand this correctly: enabling > > CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE can help detect multiple > > requests issues in existing code, so we should report to the original > > author of the modules that have this issue, > > It would be an optimization that they could make. It is not a serious > bug. It would be a more serious bug if we had tons of these reports > all over. But a few, no. > > Come to think of it, at this point in time CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE > should just be considered informational and debugging, no need to report > anything to anyone becuase I do think I can get the vmalloc wasted > space issue figured out for userspace and kernel module-autoloader, > and so this becomes a non-issue at all. Yes, it would still be nice > to reduce latencies by having duplicate requests not be issued, but > the developers keen on that would enable CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE. > > So I'd say enable CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE only if > you want to avoid help avoid duplicates yourself. The kconfig language > already is a bit clear about that, but we could enhance it to explain > the gains for developers who may want this. Since we can't talk about > the future though in kconfig, I think the language now is OK. > > The bigger source of duplicates has been determined to come from > userspace, I'll Cc you on a patch to fix the last vmalloc wasted > space, so that this would never incur tons of wasted vmalloc space. > > I'd recommend for now to just disable CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE > unless you want to make the reports informational and describe the > situation as I just did. > > Actually, could we talk soon about testing? Because I want to automate > some more tests for modules with kdevops. Thanks a lot for the detailed explanation and guidance! We will disable CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE in the regular tests of kernel test robot, and will keep an eye on the latest updates in the thread of "module: avoid all memory pressure due to duplicates" patch series [1]. [1] https://lore.kernel.org/all/20230524213620.3509138-1-mcgrof@xxxxxxxxxx/ -- Best Regards, Yujie