On Mon, Oct 14, 2024 at 8:23 PM Zhang, Rui <rui.zhang@xxxxxxxxx> wrote: > > On Mon, 2024-10-14 at 11:00 -0700, Jim Mattson wrote: > > On Mon, Oct 14, 2024 at 6:05 AM Zhang, Rui <rui.zhang@xxxxxxxxx> > > wrote: > > > > > > > > > > > > > > > TBH, I'm not sure that there is actually anything wrong with > > > > > > the > > > > > > new > > > > > > numbering scheme. > > > > > > The topology is reported correctly (e.g. in > > > > > > /sys/devices/system/cpu/cpu0/topology/thread_siblings_list). > > > > > > Yet, > > > > > > the > > > > > > new enumeration does seem to contradict user expectations. > > > > > > > > > > > > > > > > Well, we can say this is a violation of the ACPI spec. > > > > > "OSPM should initialize processors in the order that they > > > > > appear in > > > > > the > > > > > MADT." even for interleaved LAPIC and X2APIC entries. > > > > > > > > Ah. Thanks. I didn't know that. > > > > > > > > > Maybe we need two steps for LAPIC/X2APIC parsing. > > > > > 1. check if there is valid LAPIC entry by going through all > > > > > LAPIC > > > > > entries first > > > > > 2. parse LAPIC/X2APIC strictly following the order in MADT. > > > > > (like > > > > > we do > > > > > before) > > > > > > > > That makes sense to me. > > > > > > > > Thanks, > > > > > > > > --jim > > > > > > Hi, Jim, > > > > > > Please check if below patch restores the CPU IDs or not. > > > > > > thanks, > > > rui > > > > > > From ec786dfe693cad2810b54b0d8afbfc7e4c4b3f8a Mon Sep 17 00:00:00 > > > 2001 > > > From: Zhang Rui <rui.zhang@xxxxxxxxx> > > > Date: Mon, 14 Oct 2024 13:26:55 +0800 > > > Subject: [PATCH] x86/acpi: Fix LAPIC/x2APIC parsing order > > > > > > On some systems, the same CPU (with same APIC ID) is assigned with > > > a > > > different logical CPU id after commit ec9aedb2aa1a ("x86/acpi: > > > Ignore > > > invalid x2APIC entries"). > > > > > > This means Linux enumerates the CPUs in a different order and it is > > > a > > > violation of > > > https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#madt-processor-local-apic-sapic-structure-entry-order > > > , > > > > > > "OSPM should initialize processors in the order that they appear > > > in > > > the MADT" > > > > > > The offending commit wants to ignore x2APIC entries with APIC ID < > > > 255 > > > when valid LAPIC entries exist, so it parses all LAPIC entries > > > before > > > parsing any x2APIC entries. This breaks the CPU enumeration order > > > for > > > systems that have x2APIC entries listed before LAPIC entries in > > > MADT. > > > > > > Fix the problem by checking the valid LAPIC entries separately, > > > before > > > parsing any LAPIC/x2APIC entries. > > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > Reported-by: Jim Mattson <jmattson@xxxxxxxxxx> > > > Closes: > > > https://lore.kernel.org/all/20241010213136.668672-1-jmattson@xxxxxxxxxx/ > > > Fixes: ec9aedb2aa1a ("x86/acpi: Ignore invalid x2APIC entries") > > > Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx> > > > --- > > > arch/x86/kernel/acpi/boot.c | 50 > > > +++++++++++++++++++++++++++++++++---- > > > 1 file changed, 45 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/x86/kernel/acpi/boot.c > > > b/arch/x86/kernel/acpi/boot.c > > > index 4efecac49863..c70b86f1f295 100644 > > > --- a/arch/x86/kernel/acpi/boot.c > > > +++ b/arch/x86/kernel/acpi/boot.c > > > @@ -226,6 +226,28 @@ acpi_parse_x2apic(union acpi_subtable_headers > > > *header, const unsigned long end) > > > return 0; > > > } > > > > > > +static int __init > > > +acpi_check_lapic(union acpi_subtable_headers *header, const > > > unsigned long end) > > > +{ > > > + struct acpi_madt_local_apic *processor = NULL; > > > + > > > + processor = (struct acpi_madt_local_apic *)header; > > > + > > > + if (BAD_MADT_ENTRY(processor, end)) > > > + return -EINVAL; > > > + > > > + /* Ignore invalid ID */ > > > + if (processor->id == 0xff) > > > + return 0; > > > + > > > + /* Ignore processors that can not be onlined */ > > > + if (!acpi_is_processor_usable(processor->lapic_flags)) > > > + return 0; > > > + > > > + has_lapic_cpus = true; > > > + return 0; > > > +} > > > + > > > static int __init > > > acpi_parse_lapic(union acpi_subtable_headers * header, const > > > unsigned long end) > > > { > > > @@ -257,7 +279,6 @@ acpi_parse_lapic(union acpi_subtable_headers * > > > header, const unsigned long end) > > > processor->processor_id, /* ACPI ID > > > */ > > > processor->lapic_flags & > > > ACPI_MADT_ENABLED); > > > > > > - has_lapic_cpus = true; > > > return 0; > > > } > > > > > > @@ -1029,6 +1050,8 @@ static int __init > > > early_acpi_parse_madt_lapic_addr_ovr(void) > > > static int __init acpi_parse_madt_lapic_entries(void) > > > { > > > int count, x2count = 0; > > > + struct acpi_subtable_proc madt_proc[2]; > > > + int ret; > > > > > > if (!boot_cpu_has(X86_FEATURE_APIC)) > > > return -ENODEV; > > > @@ -1037,10 +1060,27 @@ static int __init > > > acpi_parse_madt_lapic_entries(void) > > > acpi_parse_sapic, > > > MAX_LOCAL_APIC); > > > > > > if (!count) { > > > - count = > > > acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC, > > > - acpi_parse_lapic, > > > MAX_LOCAL_APIC); > > > - x2count = > > > acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC, > > > - acpi_parse_x2apic, > > > MAX_LOCAL_APIC); > > > > The point is moot now, but I don't think the previous code did the > > right thing when acpi_table_parse_madt() returned a negative value > > (for errors). > > Previous and current code checks for the negative value later after > parsing both LAPIC and x2APIC. > so what is the problem you're referring to? > Do you mean we should error out immediately when parsing LAPIC fails? My mistake. I should have looked at the full context rather than just the context of the patch. > > > > > + /* Check if there are valid LAPIC entries */ > > > + acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC, > > > acpi_check_lapic, MAX_LOCAL_APIC); > > > > Two comments: > > > > 1) Should we check for a return value < 0 here, or just wait for one > > of the later walks to error out? > > I'm okay with both. > > > 2) It seems unfortunate to walk the entire table when the first entry > > may give you the answer, but perhaps modern systems have only X2APIC > > entries, so we will typically have to walk the entire table anyway. > > yeah. There are systems with invalid LAPIC entries first, and > acpi_parse_entries_array() doesn't support graceful early termination, > so we have to check all the entries. > > > > > Reviewed-and-tested-by: Jim Mattson <jmattson@xxxxxxxxxx> > > Thanks. I will submit the current version to keep your tags. > > -rui Thanks! --jim