On Fri, Dec 1, 2023 at 12:32 AM Zhang, Rui <rui.zhang@xxxxxxxxx> wrote: > > On Thu, 2023-11-23 at 20:50 +0800, Exchange Online wrote: > > Hi, John, > > > > Thanks for catching this issue. > > > > On Wed, 2023-11-22 at 22:19 +0000, John Sperbeck wrote: > > > I have a platform with both LOCAL_APIC and LOCAL_X2APIC entries for > > > each CPU. However, the ids for the LOCAL_APIC entries are all > > > invalid ids of 255, so they have always been skipped in > > > acpi_parse_lapic() > > > by this code from f3bf1dbe64b6 ("x86/acpi: Prevent LAPIC id 0xff > > > from > > > being > > > accounted"): > > > > > > /* Ignore invalid ID */ > > > if (processor->id == 0xff) > > > return 0; > > > > > > With the change in this thread, the return value of 0 means that > > > the > > > 'count' variable in acpi_parse_entries_array() is incremented. The > > > positive return value means that 'has_lapic_cpus' is set, even > > > though > > > no entries were actually matched. > > > > So in acpi_parse_madt_lapic_entries, without this patch, > > madt_proc[0].count is a positive value on this platform, right? > > > > This sounds like a potential issue because the following checks to > > fall > > back to MPS mode can also break. (If all LOCAL_APIC entries have > > apic_id 0xff and all LOCAL_X2APIC entries have apic_id 0xffffffff) > > > > > Then, when the MADT is iterated > > > with acpi_parse_x2apic(), the x2apic entries with ids less than 255 > > > are skipped and most of my CPUs aren't recognized. > > > > > > I think the original version of this change was okay for this case > > > in > > > https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/ > > > > Yeah. > > > > But if we want to fix the potential issue above, we need to do > > something more. > > > > Say we can still use acpi_table_parse_entries_array() and convert > > acpi_parse_lapic()/acpi_parse_x2apic() to > > acpi_subtable_proc.handler_arg and save the real valid entries via > > the > > parameter. > > > > or can we just use num_processors & disabled_cpus to check if there > > is > > any CPU probed when parsing LOCAL_APIC/LOCAL_X2APIC entires? > > > > Hi, John, > > As a quick fix, I'm not going to fix the "potential issue" describes > above because we have not seen a real problem caused by this yet. > > Can you please try the below patch to confirm if the problem is gone on > your system? > This patch falls back to the previous way as sent at > https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/ > > thanks, > rui > > From bdb45e241b4fea8a12b958e490979e96b064e43d Mon Sep 17 00:00:00 2001 > From: Zhang Rui <rui.zhang@xxxxxxxxx> > Date: Fri, 1 Dec 2023 15:06:34 +0800 > Subject: [PATCH] x86/acpi: Do strict X2APIC ID check only when an enabled CPU > is enumerated via LAPIC > > Commit 8e9c42d776d6 ("x86/acpi: Ignore invalid x2APIC entries") does > strict X2APIC ID check if LAPIC contains valid CPUs by checking the > acpi_table_parse_madt() return value. > > This is wrong because acpi_table_parse_madt() return value only > represents the number of legal entries parsed. For example, LAPIC entry > with LAPIC ID 0xff is counted as a legal entry, but it doesn't describe > a valid CPU. > > This causes issues on a system which has 0xff LAPIC ID in all LAPIC > entries. Because the code does strict X2APIC IDs check and ignores most > of the CPUs in X2APIC entries. > > Fix the problem by doing strict X2APIC ID check less aggressively, say > only when an enabled CPU is enumerated via LAPIC. > > Fixes: 8e9c42d776d6 ("x86/acpi: Ignore invalid x2APIC entries") > Link: https://lore.kernel.org/all/20231122221947.781812-1-jsperbeck@xxxxxxxxxx/ > Reported-by: John Sperbeck <jsperbeck@xxxxxxxxxx> > Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx> > --- > arch/x86/kernel/acpi/boot.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c > index 1a0dd80d81ac..8cc566ce486a 100644 > --- a/arch/x86/kernel/acpi/boot.c > +++ b/arch/x86/kernel/acpi/boot.c > @@ -266,6 +266,7 @@ static int __init > acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end) > { > struct acpi_madt_local_apic *processor = NULL; > + int cpu; > > processor = (struct acpi_madt_local_apic *)header; > > @@ -289,9 +290,13 @@ acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end) > * to not preallocating memory for all NR_CPUS > * when we use CPU hotplug. > */ > - acpi_register_lapic(processor->id, /* APIC ID */ > - processor->processor_id, /* ACPI ID */ > - processor->lapic_flags & ACPI_MADT_ENABLED); > + cpu = acpi_register_lapic(processor->id, /* APIC ID */ > + processor->processor_id, /* ACPI ID */ > + processor->lapic_flags & ACPI_MADT_ENABLED); > + > + /* Do strict X2APIC ID check only when an enabled CPU is enumerated via LAPIC */ > + if (cpu >= 0 ) > + has_lapic_cpus = true; > > return 0; > } > @@ -1134,7 +1139,6 @@ static int __init acpi_parse_madt_lapic_entries(void) > if (!count) { > count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC, > acpi_parse_lapic, MAX_LOCAL_APIC); > - has_lapic_cpus = count > 0; > x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC, > acpi_parse_x2apic, MAX_LOCAL_APIC); > } > -- > 2.34.1 > Yes, with that patch, the problem is gone on my system. All of the CPUs are recognized and active. Before the patch (only one CPU active): # cat /proc/cpuinfo | grep '^processor' | wc -l 1 With the patch (all CPUs active): oxco8:~# cat /proc/cpuinfo | grep '^processor' | wc -l 112