On 10/20/22 03:19, Petr Mladek wrote:
On Tue 2022-10-18 15:53:03, Prarit Bhargava wrote:
On 10/18/22 14:33, Luis Chamberlain wrote:
On Sat, Oct 15, 2022 at 11:27:10AM +0200, Petr Pavlu wrote:
The patch does address a regression observed after commit 6e6de3dee51a
("kernel/module.c: Only return -EEXIST for modules that have finished
loading"). I guess it can have a Fixes tag added to the patch.
I think it is hard to split this patch into parts because the implemented
"optimization" is the fix.
git describe --contains 6e6de3dee51a
v5.3-rc1~38^2~6
I'm a bit torn about this situation. Reverting 6e6de3dee51a would be the
right thing to do, but without it, it still leaves the issue reported
by Prarit Bhargava. We need a way to resolve the issue on stable and
then your optimizations can be applied on top.
Prarit Bhargava, please review Petry's work and see if you can come up
with a sensible way to address this for stable.
I found the original thread surrounding these changes and I do not think
this solution should have been committed to the kernel. It is not a good
solution to the problem being solved and adds complexity where none is
needed IMO.
Quoting from the original thread,
Motivation for this patch is to fix an issue observed on larger machines with
many CPUs where it can take a significant amount of time during boot to run
systemd-udev-trigger.service. An x86-64 system can have already intel_pstate
active but as its CPUs can match also acpi_cpufreq and pcc_cpufreq, udev will
attempt to load these modules too. The operation will eventually fail in the
init function of a respective module where it gets recognized that another
cpufreq driver is already loaded and -EEXIST is returned. However, one uevent
is triggered for each CPU and so multiple loads of these modules will be
present. The current code then processes all such loads individually and
serializes them with the barrier in add_unformed_module().
The way to solve this is not in the module loading code, but in the udev
code by adding a new event or in the userspace which handles the loading
events.
Option 1)
Write/modify a udev rule to to use a flock userspace file lock to prevent
repeated loading. The problem with this is that it is still racy and still
consumes CPU time repeated load the ELF header and, depending on the system
(ie a large number of cpus) would still cause a boot delay. This would be
better than what we have and is worth looking at as a simple solution. I'd
like to see boot times with this change, and I'll try to come up with a
measurement on a large CPU system.
This sounds like a ping-pong between projects where to put the
complexity. Honestly, I think that it would be more reliable if
we serialized/squashed parallel loads on the kernel side.
Well, that's the world we live in. Module loading ping pongs between
udev and the kernel.
Option 2)
Create a new udev action, "add_once" to indicate to userspace that the
module only needs to be loaded one time, and to ignore further load
requests. This is a bit tricky as both kernel space and userspace would
have be modified. The udev rule would end up looking very similar to what
we now.
The benefit of option 2 is that driver writers themselves can choose which
drivers should issue "add_once" instead of add. Drivers that are known to
run on all devices at once would call "add_once" to only issue a single
load.
My concern is how to distinguish first attempts and later (fixed)
attempts.
I mean that the module load might fail at boot. But the user might
fix the root of the problem and try to load the module once again
without reboot. The other load need not be direct. It might be part
of another more complex operation. Reload of another module.
Restart of a service.
This is an interesting complication, and something to think about.
P.
It might be problematic to do this an user-friendly way.
And it might be much more complicated in the end.
Best Regards,
Petr