On 10/24/22 08:37, Petr Pavlu wrote:
On 10/18/22 21:53, Prarit Bhargava wrote:
Quoting from the original thread,
Motivation for this patch is to fix an issue observed on larger machines with
many CPUs where it can take a significant amount of time during boot to run
systemd-udev-trigger.service. An x86-64 system can have already intel_pstate
active but as its CPUs can match also acpi_cpufreq and pcc_cpufreq, udev will
attempt to load these modules too. The operation will eventually fail in the
init function of a respective module where it gets recognized that another
cpufreq driver is already loaded and -EEXIST is returned. However, one uevent
is triggered for each CPU and so multiple loads of these modules will be
present. The current code then processes all such loads individually and
serializes them with the barrier in add_unformed_module().
The way to solve this is not in the module loading code, but in the udev
code by adding a new event or in the userspace which handles the loading
events.
Option 1)
Write/modify a udev rule to to use a flock userspace file lock to
prevent repeated loading. The problem with this is that it is still
racy and still consumes CPU time repeated load the ELF header and,
depending on the system (ie a large number of cpus) would still cause a
boot delay. This would be better than what we have and is worth looking
at as a simple solution. I'd like to see boot times with this change,
and I'll try to come up with a measurement on a large CPU system.
It is not immediately clear to me how this can be done as a udev rule. You
mention that you'll try to test this on a large CPU system. Does it mean that
you have a prototype implemented already? If yes, could you please share it?
Hi Petr,
Sorry, I haven't had a chance to actually test this out but I see this
problem with the acpi_cpufreq and other multiple-cpu drivers which load
once per logical cpu. I was thinking of adding a udev rule like:
ACTION!="add", GOTO="acpi_cpufreq_end"
# I may have to add CPU modaliases here to get this to work correctly
ENV{MODALIAS}=="acpi:ACPI0007:", GOTO="acpi_cpufreq_start"
GOTO="acpi_cpufreq_start"
GOTO="acpi_cpufreq_end"
LABEL="acpi_cpufreq_start"
ENV{DELAY_MODALIAS}="$env{MODALIAS}"
ENV{MODALIAS}=""
PROGRAM="/bin/sh -c flock -n /tmp/delay_acpi_cpufreq sleep 2'",
RESULT=="", RUN{builtin}+="kmod load $env{DELAY_MODALIAS}"
LABEL="acpi_cpufreq_end"
My reading is that one would need to update the "MODALIAS" rule in
80-drivers.rules [1] to do this locking. However, that just collects
'kmod load' (builtin) for udev to execute after all rules are processed. It
would then be required to synchronize udev workers to prevent repeated
loading?
Yes, that would be the case.
Option 2)
Create a new udev action, "add_once" to indicate to userspace that the
module only needs to be loaded one time, and to ignore further load
requests. This is a bit tricky as both kernel space and userspace would
have be modified. The udev rule would end up looking very similar to
what we now.
The benefit of option 2 is that driver writers themselves can choose
which drivers should issue "add_once" instead of add. Drivers that are
known to run on all devices at once would call "add_once" to only issue
a single load.
On the device event side, I more wonder if it would be possible to avoid tying
up cpufreq and edac modules to individual CPU devices. Maybe their loading
could be attached to some platform device, even if it means introducing an
auxiliary device for this purpose? I need to look a bit more into this idea.
That's an interesting idea and something I had not considered. Creating
a virtual device and loading against that device would be much better
(easier?) of a solution.
P.
[1] https://github.com/systemd/systemd/blob/4856f63846fc794711e1b8ec970e4c56494cd320/rules.d/80-drivers.rules
Thanks,
Petr