On Tue, 30 Apr 2024 14:17:24 +1000 Gavin Shan <gshan@xxxxxxxxxx> wrote: > On 4/26/24 23:51, Jonathan Cameron wrote: > > Make the per_cpu(processors, cpu) entries available earlier so that > > they are available in arch_register_cpu() as ARM64 will need access > > to the acpi_handle to distinguish between acpi_processor_add() > > and earlier registration attempts (which will fail as _STA cannot > > be checked). > > > > Reorder the remove flow to clear this per_cpu() after > > arch_unregister_cpu() has completed, allowing it to be used in > > there as well. > > > > Note that on x86 for the CPU hotplug case, the pr->id prior to > > acpi_map_cpu() may be invalid. Thus the per_cpu() structures > > must be initialized after that call or after checking the ID > > is valid (not hotplug path). > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > > > --- > > v8: On buggy bios detection when setting per_cpu structures > > do not carry on. > > Fix up the clearing of per cpu structures to remove unwanted > > side effects and ensure an error code isn't use to reference them. > > --- > > drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++-------------- > > 1 file changed, 48 insertions(+), 31 deletions(-) > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > index ba0a6f0ac841..3b180e21f325 100644 > > --- a/drivers/acpi/acpi_processor.c > > +++ b/drivers/acpi/acpi_processor.c > > @@ -183,8 +183,38 @@ static void __init acpi_pcc_cpufreq_init(void) {} > > #endif /* CONFIG_X86 */ > > > > /* Initialization */ > > +static DEFINE_PER_CPU(void *, processor_device_array); > > + > > +static bool acpi_processor_set_per_cpu(struct acpi_processor *pr, > > + struct acpi_device *device) > > +{ > > + BUG_ON(pr->id >= nr_cpu_ids); > > One blank line after BUG_ON() if we need to follow original implementation. Sure unintentional - I'll put that back. > > > + /* > > + * Buggy BIOS check. > > + * ACPI id of processors can be reported wrongly by the BIOS. > > + * Don't trust it blindly > > + */ > > + if (per_cpu(processor_device_array, pr->id) != NULL && > > + per_cpu(processor_device_array, pr->id) != device) { > > + dev_warn(&device->dev, > > + "BIOS reported wrong ACPI id %d for the processor\n", > > + pr->id); > > + /* Give up, but do not abort the namespace scan. */ > > It depends on how the return value is handled by the caller if the namespace > is continued to be scanned. The caller can be acpi_processor_hotadd_init() > and acpi_processor_get_info() after this patch is applied. So I think this > specific comment need to be moved to the caller. Good point. This gets messy and was an unintended change. Previously the options were: 1) acpi_processor_get_info() failed for other reasons - this code was never called. 2) acpi_processor_get_info() succeeded without acpi_processor_hotadd_init (non hotplug) this code then ran and would paper over the problem doing a bunch of cleanup under err. 3) acpi_processor_get_info() succeeded with acpi_processor_hotadd_init called. This code then ran and would paper over the problem doing a bunch of cleanup under err. We should maintain that or argue cleanly against it. This isn't helped the the fact I have no idea which cases we care about for that bios bug handling. Do any of those bios's ever do hotplug? Guess we have to try and maintain whatever protection this was offering. Also, the original code leaks data in some paths and I have limited idea of whether it is intentional or not. So to tidy the issue up that you've identified I'll need to try and make that code consistent first. I suspect the only way to do that is going to be to duplicate the allocations we 'want' to leak to deal with the bios bug detection. For example acpi_processor_get_info() failing leaks pr and pr->throttling.shared_cpu_map before this series. After this series we need pr to leak because it's used for the detection via processor_device_array. I'll work through this but it's going to be tricky to tell if we get right. Step 1 will be closing the existing leaks and then we will have something consistent to build on. > > Besides, it seems acpi_processor_set_per_cpu() isn't properly called and > memory leakage can happen. More details are given below. > > > + return false; > > + } > > + /* > > + * processor_device_array is not cleared on errors to allow buggy BIOS > > + * checks. > > + */ > > + per_cpu(processor_device_array, pr->id) = device; > > + per_cpu(processors, pr->id) = pr; > > + > > + return true; > > +} > > + > > #ifdef CONFIG_ACPI_HOTPLUG_CPU > > -static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > +static int acpi_processor_hotadd_init(struct acpi_processor *pr, > > + struct acpi_device *device) > > { > > int ret; > > > > @@ -198,8 +228,15 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > if (ret) > > goto out; > > > > + if (!acpi_processor_set_per_cpu(pr, device)) { > > + acpi_unmap_cpu(pr->id); > > + goto out; > > + } > > + > > With the 'goto out', zero is returned from acpi_processor_hotadd_init() to acpi_processor_get_info(). > The zero return value is carried from acpi_map_cpu() in acpi_processor_hotadd_init(). If I'm correct, > we need return errno from acpi_processor_get_info() to acpi_processor_add() so that cleanup can be > done. For example, the cleanup corresponding to the 'err' tag can be done in acpi_processor_add(). > Otherwise, we will have memory leakage. > > > ret = arch_register_cpu(pr->id); > > if (ret) { > > + /* Leave the processor device array in place to detect buggy bios */ > > + per_cpu(processors, pr->id) = NULL; > > acpi_unmap_cpu(pr->id); > > goto out; > > } > > @@ -217,7 +254,8 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > > return ret; > > } > > #else > > -static inline int acpi_processor_hotadd_init(struct acpi_processor *pr) > > +static inline int acpi_processor_hotadd_init(struct acpi_processor *pr, > > + struct acpi_device *device) > > { > > return -ENODEV; > > } > > @@ -316,10 +354,13 @@ static int acpi_processor_get_info(struct acpi_device *device) > > * because cpuid <-> apicid mapping is persistent now. > > */ > > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) { > > - int ret = acpi_processor_hotadd_init(pr); > > + int ret = acpi_processor_hotadd_init(pr, device); > > > > if (ret) > > return ret; > > + } else { > > + if (!acpi_processor_set_per_cpu(pr, device)) > > + return 0; > > } > > > > For non-hotplug case, we still need pass the error to acpi_processor_add() so that > cleanup corresponding 'err' tag can be done. Otherwise, we will have memory leakage. > > > /* > > @@ -365,8 +406,6 @@ static int acpi_processor_get_info(struct acpi_device *device) > > * (cpu_data(cpu)) values, like CPU feature flags, family, model, etc. > > * Such things have to be put in and set up by the processor driver's .probe(). > > */ > > -static DEFINE_PER_CPU(void *, processor_device_array); > > - > > static int acpi_processor_add(struct acpi_device *device, > > const struct acpi_device_id *id) > > { > > @@ -395,28 +434,6 @@ static int acpi_processor_add(struct acpi_device *device, > > if (result) /* Processor is not physically present or unavailable */ > > return 0; > > > > - BUG_ON(pr->id >= nr_cpu_ids); > > - > > - /* > > - * Buggy BIOS check. > > - * ACPI id of processors can be reported wrongly by the BIOS. > > - * Don't trust it blindly > > - */ > > - if (per_cpu(processor_device_array, pr->id) != NULL && > > - per_cpu(processor_device_array, pr->id) != device) { > > - dev_warn(&device->dev, > > - "BIOS reported wrong ACPI id %d for the processor\n", > > - pr->id); > > - /* Give up, but do not abort the namespace scan. */ > > - goto err; > > - } > > - /* > > - * processor_device_array is not cleared on errors to allow buggy BIOS > > - * checks. > > - */ > > - per_cpu(processor_device_array, pr->id) = device; > > - per_cpu(processors, pr->id) = pr; > > - > > dev = get_cpu_device(pr->id); > > if (!dev) { > > result = -ENODEV; > > @@ -469,10 +486,6 @@ static void acpi_processor_remove(struct acpi_device *device) > > device_release_driver(pr->dev); > > acpi_unbind_one(pr->dev); > > > > - /* Clean up. */ > > - per_cpu(processor_device_array, pr->id) = NULL; > > - per_cpu(processors, pr->id) = NULL; > > - > > cpu_maps_update_begin(); > > cpus_write_lock(); > > > > @@ -480,6 +493,10 @@ static void acpi_processor_remove(struct acpi_device *device) > > arch_unregister_cpu(pr->id); > > acpi_unmap_cpu(pr->id); > > > > + /* Clean up. */ > > + per_cpu(processor_device_array, pr->id) = NULL; > > + per_cpu(processors, pr->id) = NULL; > > + > > cpus_write_unlock(); > > cpu_maps_update_done(); > > > > Thanks, > Gavin >