Since apic.c:apic_ops is not guarded against concurrent accesses, there exists a race between reset_apic(), enable_apic() and enable_x2apic() which results in APs crashing or getting blocked in various scenarios (eg, enabling x2apic while disabling xapic). The bug is rare with vcpu count < 32, but becomes easier to reproduce with vcpus > 64 and the following thunk: lib/x86/apic.c: void enable_apic(void) { - printf("enabling apic\n"); xapic_write(APIC_SPIV, 0x1ff); } Serialize the bringup code in ap_start64 to fix this. Signed-off-by: Varad Gautam <varad.gautam@xxxxxxxx> Link: https://lore.kernel.org/kvm/20220406124002.13741-1-varad.gautam@xxxxxxxx/ --- Note that this is a C port of 20220406124002.13741-1-varad.gautam@xxxxxxxx which is not present upstream. I can squash it into the previous patch once the asm version is upstream. lib/x86/setup.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/lib/x86/setup.c b/lib/x86/setup.c index 261fd9b..b08290a 100644 --- a/lib/x86/setup.c +++ b/lib/x86/setup.c @@ -16,6 +16,9 @@ #include "asm/setup.h" #include "processor.h" #include "atomic.h" +#include "asm/spinlock.h" + +struct spinlock ap_lock; extern char edata; extern unsigned char online_cpus[(MAX_TEST_CPUS + 7) / 8]; @@ -371,12 +374,14 @@ void save_id(void) void ap_start64(void) { + spin_lock(&ap_lock); reset_apic(); load_idt(); setup_gdt_tss(); save_id(); enable_apic(); enable_x2apic(); + spin_unlock(&ap_lock); sti(); atomic_fetch_inc(&cpu_online_count); asm volatile("1: hlt; jmp 1b"); -- 2.32.0