This is a cut-down version of parallel CPU bringup for x86_64, which only does the INIT/SIPI/SIPI for APs in a CPUHP_BP_PARALLEL_DYN stage before the normal bringup to CPUHP_ONLINE happens sequentially. Thus, we don't yet need any of the cleanups in RCU, TSC sync, topology updates, etc. — we only need to handle reentrancy through the real mode trampoline and the beginning of start_secondary() up to the point where it waits in wait_for_master_cpu(). This much is simple and sane enough to be merged, I think — modulo the lack of sign-off on the patch that Thomas now claims not to remember writing :) This brings the 96-thread 2-socket Skylake startup time from 500ms to 100ms, which is a bit more modest than the 34ms we claimed before, but still a nice win. Further testing and analysis has shown us that allowing the APs to proceed from wait_from_master_cpu() in parallel is going to require a bit more thought. Once the APs reach smp_callin(), they call notify_cpu_starting() which walks through the states up to min(st->target, CPUHP_AP_ONLINE_IDLE). But if we allow the AP to get there when its target is one of the CPUHP_BP_PARALLEL_DYN states, that means that notify_cpu_starting() doesn't walk it through any states at all! And then when the AP gets to the end of start_secondary() it ends up in cpu_startup_entry() which *sets* the state to CPUHP_AP_ONLINE_IDLE and thus has effectively *skipped* all the CPUHP_*_STARTING states. The cheap answer is to explicitly walk to CPUHP_AP_ONLINE_IDLE but I don't want to let the APs *overtake* the target set for them by the overall CPUHP state machine. So I think the better solution for further parallelisation is to make bringup_nonboot_cpus() bring all the APs to CPUHP_AP_ONLINE_IDLE in parallel, and *then* bring them to CPUHP_ONLINE. We will continue to play with that one and make sure the rest of the startup states are reentrant, in addition to the ones we've already fixed in https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-5.16 v2: Only do do_cpu_up() for APs in parallel, nothing more. Drop half the fixes that aren't yet needed until we go further. David Woodhouse (6): x86/apic/x2apic: Fix parallel handling of cluster_mask cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h> cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() x86/smpboot: Split up native_cpu_up into separate phases and document them x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel Thomas Gleixner (1): x86/smpboot: Support parallel startup of secondary CPUs arch/x86/include/asm/realmode.h | 3 + arch/x86/include/asm/smp.h | 9 +- arch/x86/kernel/acpi/sleep.c | 1 + arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/apic/x2apic_cluster.c | 82 ++++++----- arch/x86/kernel/head_64.S | 71 ++++++++++ arch/x86/kernel/smpboot.c | 251 +++++++++++++++++++++++++--------- arch/x86/realmode/init.c | 3 + arch/x86/realmode/rm/trampoline_64.S | 14 ++ include/linux/cpuhotplug.h | 2 + include/linux/smpboot.h | 7 + kernel/cpu.c | 27 +++- kernel/smpboot.c | 2 +- kernel/smpboot.h | 2 - 14 files changed, 371 insertions(+), 105 deletions(-)