On Tue, Feb 20, 2024 at 11:27:15AM +0000, Russell King (Oracle) wrote: > On Thu, Feb 15, 2024 at 08:22:29PM +0100, Rafael J. Wysocki wrote: > > On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@xxxxxxxxxxxxxxx> wrote: > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > > index cf7c1cca69dd..a68c475cdea5 100644 > > > --- a/drivers/acpi/acpi_processor.c > > > +++ b/drivers/acpi/acpi_processor.c > > > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device) > > > cpufreq_add_device("acpi-cpufreq"); > > > } > > > > > > + /* > > > + * Register CPUs that are present. get_cpu_device() is used to skip > > > + * duplicate CPU descriptions from firmware. > > > + */ > > > + if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) && > > > + !get_cpu_device(pr->id)) { > > > + int ret = arch_register_cpu(pr->id); > > > + > > > + if (ret) > > > + return ret; > > > + } > > > + > > > /* > > > * Extra Processor objects may be enumerated on MP systems with > > > * less than the max # of CPUs. They should be ignored _iff > > > > This is interesting, because right below there is the following code: > > > > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { > > int ret = acpi_processor_hotadd_init(pr); > > > > if (ret) > > return ret; > > } > > > > and acpi_processor_hotadd_init() essentially calls arch_register_cpu() > > with some extra things around it (more about that below). > > > > I do realize that acpi_processor_hotadd_init() is defined under > > CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's > > consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set. > > > > So why are the two conditionals that almost contradict each other both > > needed? It looks like the new code could be combined with > > acpi_processor_hotadd_init() to do the right thing in all cases. > > > > Now, acpi_processor_hotadd_init() does some extra things that look > > like they should be done by the new code too. > > > > 1. It checks invalid_phys_cpuid() which appears to be a good idea to me. > > > > 2. It uses locking around arch_register_cpu() which doesn't seem > > unreasonable either. > > > > 3. It calls acpi_map_cpu() and I'm not sure why this is not done by > > the new code. > > > > The only thing that can be dropped from it is the _STA check AFAICS, > > because acpi_processor_add() won't even be called if the CPU is not > > present (and not enabled after the first patch). > > > > So why does the code not do 1 - 3 above? > > Honestly, I'm out of my depth with this and can't answer your > questions - and I really don't want to try fiddling with this code > because it's just too icky (even in its current form in mainline) > to be understandable to anyone who hasn't gained a detailed knowledge > of this code. > > It's going to require a lot of analysis - how acpi_map_cpuid() behaves > in all circumstances, what this means for invalid_logical_cpuid() and > invalid_phys_cpuid(), what paths will be taken in each case. This code > is already just too hairy for someone who isn't an experienced ACPI > hacker to be able to follow and I don't see an obvious way to make it > more readable. > > James' additions make it even more complex and less readable. As an illustration of the problems I'm having here, I was just writing a reply to this with a suggestion of transforming this code ultimately to: if (!get_cpu_device(pr->id)) { int ret; if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id)) ret = acpi_processor_make_enabled(pr); else ret = acpi_processor_make_present(pr); if (ret) return ret; } (acpi_processor_make_present() would be acpi_processor_hotadd_init() and acpi_processor_make_enabled() would be arch_register_cpu() at this point.) Then I realised that's a bad idea - because we really need to check that pr->id is valid before calling get_cpu_device() on it, so this won't work. That leaves us with: int ret; if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { /* x86 et.al. path */ ret = acpi_processor_make_present(pr); } else if (!get_cpu_device(pr->id)) { /* Arm64 path */ ret = acpi_processor_make_enabled(pr); } else { ret = 0; } if (ret) return ret; Now, the next transformation would be to move !get_cpu_device(pr->id) into acpi_processor_make_enabled() which would eliminate one of those if() legs. Now, if we want to somehow make the call to arch_regster_cpu() common in these two paths, the next question is what are the _precise_ semantics of acpi_map_cpu(), particularly with respect to it modifying pr->id. Is it guaranteed to always give the same result for the same processor described in ACPI? What acpi_map_cpu() anyway, I can find no documentation for it. Then there's the question whether calling acpi_unmap_cpu() should be done on the failure path if arch_register_cpu() fails, which is done for the x86 path but not the Arm64 path. Should it be done for the Arm64 path? I've no idea, but as Arm64 doesn't implement either of these two functions, I guess they could be stubbed out and thus be no-ops - but then we open a hole where if pr->id is invalid, we end up passing that invalid value to arch_register_cpu() which I'm quite sure will explode with a negative CPU number. So, to my mind, what you're effectively asking for is a total rewrite of all the code in and called by acpi_processor_get_info()... and that is not something I am willing to do (because it's too far outside of my knowledge area.) As I said in my reply to patch 1, I think your comments on patch 2 make Arm64 vcpu hotplug unachievable in a reasonable time frame, and certainly outside the bounds of what I can do to progress this. So, at this point I'm going to stand down from further participation with this patch set as I believe I've reached the limit of what I can do to progress it. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!