On 5/12/2023 3:07 PM, Thomas Gleixner wrote:
From: David Woodhouse <dwmw@xxxxxxxxxxxx>
In parallel startup mode the APs are kicked alive by the control CPU
quickly after each other and run through the early startup code in
parallel. The real-mode startup code is already serialized with a
bit-spinlock to protect the real-mode stack.
In parallel startup mode the smpboot_control variable obviously cannot
contain the Linux CPU number so the APs have to determine their Linux CPU
number on their own. This is required to find the CPUs per CPU offset in
order to find the idle task stack and other per CPU data.
To achieve this, export the cpuid_to_apicid[] array so that each AP can
find its own CPU number by searching therein based on its APIC ID.
Introduce a flag in the top bits of smpboot_control which indicates that
the AP should find its CPU number by reading the APIC ID from the APIC.
This is required because CPUID based APIC ID retrieval can only provide the
initial APIC ID, which might have been overruled by the firmware. Some AMD
APUs come up with APIC ID = initial APIC ID + 0x10, so the APIC ID to CPU
number lookup would fail miserably if based on CPUID. Also virtualization
can make its own APIC ID assignements. The only requirement is that the
APIC IDs are consistent with the APCI/MADT table.
For the boot CPU or in case parallel bringup is disabled the control bits
are empty and the CPU number is directly available in bit 0-23 of
smpboot_control.
[ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ]
[ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ]
[ seanc: Fix stray override of initial_gs in common_cpu_up() ]
[ Oleksandr Natalenko: reported suspend/resume issue fixed in
x86_acpi_suspend_lowlevel ]
[ tglx: Make it read the APIC ID from the APIC instead of using CPUID,
split the bitlock part out ]
Co-developed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Co-developed-by: Brian Gerst <brgerst@xxxxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Brian Gerst <brgerst@xxxxxxxxx>
Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Tested-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>
---
I pulled in this change via the next tree, tag next-20230519 and I get a
build failure using the x86_64_defconfig -
DESCEND objtool
INSTALL libsubcmd_headers
CALL scripts/checksyscalls.sh
AS arch/x86/kernel/head_64.o
arch/x86/kernel/head_64.S: Assembler messages:
arch/x86/kernel/head_64.S:261: Error: missing ')'
arch/x86/kernel/head_64.S:261: Error: junk `UL<<10)' after expression
CC arch/x86/kernel/head64.o
CC arch/x86/kernel/ebda.o
CC arch/x86/kernel/platform-quirks.o
scripts/Makefile.build:374: recipe for target
'arch/x86/kernel/head_64.o' failed
make[3]: *** [arch/x86/kernel/head_64.o] Error 1
make[3]: *** Waiting for unfinished jobs....
scripts/Makefile.build:494: recipe for target 'arch/x86/kernel' failed
make[2]: *** [arch/x86/kernel] Error 2
scripts/Makefile.build:494: recipe for target 'arch/x86' failed
make[1]: *** [arch/x86] Error 2
make[1]: *** Waiting for unfinished jobs....
Makefile:2026: recipe for target '.' failed
make: *** [.] Error 2
This is with GCC 5.4.0, if it matters.
Reverting this change allows the build to move forward, although I also
need to revert "x86/smpboot/64: Implement
arch_cpuhp_init_parallel_bringup() and enable it" for the build to fully
succeed.
I'm not familiar with this code, and nothing obvious stands out to me.
What can I do to help root cause this?
-Jeff