On Fri, 12 Apr 2024 20:30:40 +0200 "Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote: > On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron > <Jonathan.Cameron@xxxxxxxxxx> wrote: > > > > From: James Morse <james.morse@xxxxxxx> > > > > The arm64 specific arch_register_cpu() call may defer CPU registration > > until the ACPI interpreter is available and the _STA method can > > be evaluated. > > > > If this occurs, then a second attempt is made in > > acpi_processor_get_info(). Note that the arm64 specific call has > > not yet been added so for now this will never be successfully > > called. > > > > Systems can still be booted with 'acpi=off', or not include an > > ACPI description at all as in these cases arch_register_cpu() > > will not have deferred registration when first called. > > > > This moves the CPU register logic back to a subsys_initcall(), > > while the memory nodes will have been registered earlier. > > Note this is where the call was prior to the cleanup series so > > there should be no side effects of moving it back again for this > > specific case. > > > > [PATCH 00/21] Initial cleanups for vCPU HP. > > https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@xxxxxxxxxxxxxxxxxxxxx/ > > > > e.g. 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES") > > > > Signed-off-by: James Morse <james.morse@xxxxxxx> > > Reviewed-by: Gavin Shan <gshan@xxxxxxxxxx> > > Tested-by: Miguel Luis <miguel.luis@xxxxxxxxxx> > > Tested-by: Vishnu Pajjuri <vishnu@xxxxxxxxxxxxxxxxxxxxxx> > > Tested-by: Jianyong Wu <jianyong.wu@xxxxxxx> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx> > > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > Signed-off-by: Joanthan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > --- > > v5: Update commit message to make it clear this is moving the > > init back to where it was until very recently. > > > > No longer change the condition in the earlier registration point > > as that will be handled by the arm64 registration routine > > deferring until called again here. > > --- > > drivers/acpi/acpi_processor.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > index 93e029403d05..c78398cdd060 100644 > > --- a/drivers/acpi/acpi_processor.c > > +++ b/drivers/acpi/acpi_processor.c > > @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct acpi_device *device) > > > > c = &per_cpu(cpu_devices, pr->id); > > ACPI_COMPANION_SET(&c->dev, device); > > + /* > > + * Register CPUs that are present. get_cpu_device() is used to skip > > + * duplicate CPU descriptions from firmware. > > + */ > > + if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) && > > + !get_cpu_device(pr->id)) { > > + int ret = arch_register_cpu(pr->id); > > + > > + if (ret) > > + return ret; > > + } > > + > > /* > > * Extra Processor objects may be enumerated on MP systems with > > * less than the max # of CPUs. They should be ignored _iff > > -- > > I am still unsure why there need to be two paths calling > arch_register_cpu() in acpi_processor_get_info(). I replied further down the thread, but the key point was to maintain the strong distinction between 'what' was done in a real hotplug path vs one where onlining was all. We can relax that but it goes contrary to the careful dance that was needed to get any agreement to the ARM architecture aspects of this. > > Just below the comment partially pulled into the patch context above, > there is this code: > > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { > int ret = acpi_processor_hotadd_init(pr); > > if (ret) > return ret; > } > > For the sake of the argument, fold acpi_processor_hotadd_init() into > it and drop the redundant _STA check from it: If we combine these, the _STA check is necessary because we will call this path for delayed onlining of ARM64 CPUs (if the earlier registration code call or arch_register_cpu() returned -EPROBE defer). That's the only way we know that a given CPU is online capable but firmware is saying we can't bring it online yet (it may be be vHP later). > > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { > if (invalid_phys_cpuid(pr->phys_id)) > return -ENODEV; > > cpu_maps_update_begin(); > cpus_write_lock(); > > ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id); I read that call as acpi_map_cpu_for_physical_cpu_hotplug() but we could make it equivalent of. acpi_map_cpu_for_whatever_cpu_hotplug() (I'm not proposing those names though ;) in which case it is fine to just stub it out on ARM64. > if (ret) { > cpus_write_unlock(); > cpu_maps_update_done(); > return ret; > } > ret = arch_register_cpu(pr->id); > if (ret) { > acpi_unmap_cpu(pr->id); > > cpus_write_unlock(); > cpu_maps_update_done(); > return ret; > } > pr_info("CPU%d has been hot-added\n", pr->id); > pr->flags.need_hotplug_init = 1; This one needs more careful handling because we are calling this for non hotplug cases on arm64 in which case we end up setting this for initially online CPUs - thus if we offline and online them again via sysfs /sys/bus/cpu/device/cpuX/online it goes through the hotplug path and should not. So I need a way to detect if we are hotplugging the cpu or not. Is there a standard way to do this? I haven't figured out how to use flags in drivers to communicate this state. > > cpus_write_unlock(); > cpu_maps_update_done(); > } > > so I'm not sure why this cannot be combined with the new code. > > Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls. > What's the difference then? The locking, which should be fine if I'm > not mistaken and need_hotplug_init that needs to be set if this code > runs after the processor driver has loaded AFAICS. That's the bit that I'm currently finding a challenge. Is there a clean way to detect that? Jonathan