Hi Petr,
在 2024/11/18 20:54, Petr Pavlu 写道:
On 11/13/24 03:15, Song Chen wrote:
在 2024/11/12 20:56, Petr Pavlu 写道:
On 11/10/24 12:42, Song Chen wrote:
Sometimes when kernel calls request_module to load a module
into kernel space, it doesn't pass the module name appropriately,
and request_module doesn't verify it as well.
As a result, modprobe is invoked anyway and spend a lot of time
searching a nonsense name.
For example reported from a customer, he runs a user space process
to call ioctl(fd, SIOCGIFINDEX, &ifr), the callstack in kernel is
like that:
dev_ioctl(net/core/dev_iovtl.c)
dev_load
request_module("netdev-%s", name);
or request_module("%s", name);
However if name of NIC is empty, neither dev_load nor request_module
checks it at the first place, modprobe will search module "netdev-"
in its default path, env path and path configured in etc for nothing,
increase a lot system overhead.
To address this problem, this patch copies va_list and introduces
a helper is_module_name_valid to verify the parameters validity
one by one, either null or empty. if it fails, no modprobe invoked.
I'm not sure if I fully follow why this should be addressed at the
request_module() level. If the user repeatedly invokes SIOCGIFINDEX with
an empty name and this increases their system load, wouldn't it be
better to update the userspace to prevent this non-sense request in the
first place?
If the user process knew, it wouldn't make the mistake.
The user process should be able to check that the ifr_name passed to
SIOCGIFINDEX is empty and avoid the syscall altogether, or am I missing
something? Even if the kernel gets improved in some way to handle this
case better, I would still suggest looking at what the application is
doing and how it ends up making this call.
yes, agree, it's the user space process's fault after all.
moreover, what
happened in dev_load was quite confusing, please see the code below:
no_module = !dev;
if (no_module && capable(CAP_NET_ADMIN))
no_module = request_module("netdev-%s", name);
if (no_module && capable(CAP_SYS_MODULE))
request_module("%s", name);
Running the same process, sys admin or root user spends more time than
normal user, it took a while for us to find the cause, that's why i
tried to fix it in kernel.
Similarly, if something should be done in the kernel,
wouldn't it be more straightforward for dev_ioctl()/dev_load() to check
this case?
I thought about it at the beginning, not only dev_ioctl/dev_load but
also other request_module callers should check this case as well, that
would be too much effort, then I switched to check it at the beginning
of request_module which every caller goes through.
I think the same should in principle apply to other places that might
invoke request_module() with "%s" and a bogus value. The callers can
appropriately decide if their request makes sense and should be
fixed/improved.
Callees are obliged to do fault tolerance for callers, or at least let
them know what is going on inside, what kinds of mistake they are
making, there are a lot of such cases in kernel, such as call_modprobe
in kernel/module/kmod.c, it checks if orig_module_name is NULL.
Ok, I see the idea behind checking that a value passed to
request_module() to format "%s" is non-NULL.
I'm however not sure about rejecting empty strings as is also done by
the patch. Consider a call to request_module("mod%s", suffix) where the
suffix could be empty to select the default variant, or non-empty to
select e.g. some optimized version of the module. Only the caller knows
if the suffix being empty is valid or not.
I've checked if this pattern is currently used in the kernel and wasn't
able to find anything, so that is good. However, I'm not sure if
request_module() should flat-out reject this use.
I accidentally found another problem in request_module when i was
testing this patch again, if the caller just passes a empty pointer to
request_module, like request_module(NULL), the process will be broken:
[ 2.336160] ? asm_exc_page_fault+0x2b/0x30
[ 2.336160] ? __pfx_crc64_rocksoft_notify+0x10/0x10
[ 2.336160] ? vsnprintf+0x5a/0x4f0
[ 2.336160] __request_module+0x93/0x2b0
[ 2.336160] ? __pfx_crc64_rocksoft_notify+0x10/0x10
[ 2.336160] ? notifier_call_chain+0x65/0xd0
[ 2.336160] ? __pfx_crc64_rocksoft_notify+0x10/0x10
[ 2.336160] crypto_probing_notify+0x43/0x60
(please ignore the caller, that is a testing code.)
I searched kernel code if this patter exists, and found in
__trace_bprintk of kernel/trace/trace_printk.c, it checks fmt at the
beginning of the function:
va_list ap;
if (unlikely(!fmt))
return 0;
Therefore, i would like to suggest we should at least add this check in
request_module too. In that sense, why don't we do a little further to
verify every parameter's validity to provide better fault tolerance,
besides, it costs almost nothing.
If you like this idea, i will send a v2.
Many thanks.
Song