Dear David,
Am 27.12.21 um 17:57 schrieb Paul Menzel:
Am 15.12.21 um 15:56 schrieb David Woodhouse:
Doing the INIT/SIPI/SIPI in parallel for all APs and *then* waiting for
them shaves about 80% off the AP bringup time on a 96-thread socket
Skylake box (EC2 c5.metal) — from about 500ms to 100ms.
There are more wins to be had with further parallelisation, but this is
the simple part.
v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
in preparation for more parallelisation.
David Woodhouse (8):
x86/apic/x2apic: Fix parallel handling of cluster_mask
cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU
x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
x86/smpboot: Split up native_cpu_up into separate phases and document them
x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
x86/mtrr: Avoid repeated save of MTRRs on boot-time CPU bringup
x86/smpboot: Serialize topology updates for secondary bringup
Thomas Gleixner (1):
x86/smpboot: Support parallel startup of secondary CPUs
arch/x86/include/asm/realmode.h | 3 +
arch/x86/include/asm/smp.h | 13 +-
arch/x86/include/asm/topology.h | 2 -
arch/x86/kernel/acpi/sleep.c | 1 +
arch/x86/kernel/apic/apic.c | 2 +-
arch/x86/kernel/apic/x2apic_cluster.c | 108 +++++++-----
arch/x86/kernel/cpu/common.c | 6 +-
arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +
arch/x86/kernel/head_64.S | 71 ++++++++
arch/x86/kernel/smpboot.c | 324 ++++++++++++++++++++++++----------
arch/x86/realmode/init.c | 3 +
arch/x86/realmode/rm/trampoline_64.S | 14 ++
arch/x86/xen/smp_pv.c | 4 +-
include/linux/cpuhotplug.h | 2 +
include/linux/smpboot.h | 7 +
kernel/cpu.c | 27 ++-
kernel/smpboot.c | 2 +-
kernel/smpboot.h | 2 -
18 files changed, 441 insertions(+), 159 deletions(-)
Thank you for working on this. I tested this on a MSI MS-7A37/B350M
MORTAR (BIOS 1.MW 11/01/2021) with a Ryzen 3 2200G, but nothing was
printed to the screen after the GRUB loading messages, so it crashed or
hung somewhere. Unfortunately, this device is used by others, and no
serial console is connected and I do not know how to capture the Linux
log with other means.
Same on the ASUS F2A85-M PRO with AMD A6-6400K. Without serial console,
the messages below are printed below to the monitor after nine seconds.
[ 1.078879] smp: Bringing up secondary CPUs ...
[ 1.080950] x86: Booting SMP configuration:
Please find the serial log attached.
Kind regards,
Paul
[ 0.000000] Linux version 5.16.0-rc7-00106-gcc498e0c43be (root@45e877da5b3e) (gcc (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37) #245 SMP PREEMPT Tue Dec 28 10:00:33 UTC 2021
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.16.0-rc7-00106-gcc498e0c43be root=/dev/sda3 rw debug noisapnp cryptomgr.notests ipv6.disable_ipv6=1 selinux=0 console=ttyS0,115200 console=tty1 earlyprintk=serial,ttyS0,115200,keep
[ 0.000000] random: get_random_u32 called from bsp_init_amd+0x142/0x210 with crng_init=0
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] signal: max sigframe size: 1776
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000005fe45fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000005fe46000-0x000000007fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017effffff] usable
[ 0.000000] printk: console [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 3.0.0 present.
[ 0.000000] DMI: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 4.15-676-g90cfb8f5ef 12/28/2021
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Initial usec timer 20439600
[ 0.000000] tsc: Detected 3900.178 MHz processor
[ 0.000588] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.007106] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.012655] last_pfn = 0x17f000 max_arch_pfn = 0x400000000
[ 0.018249] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
Memory KASLR using RDTSC...
[ 0.027700] last_pfn = 0x5fe46 max_arch_pfn = 0x400000000
[ 0.036861] Using GB pages for direct mapping
[ 0.041210] ACPI: Early table checksum verification disabled
[ 0.046691] ACPI: RSDP 0x00000000000F6250 000024 (v02 COREv4)
[ 0.052409] ACPI: XSDT 0x000000005FE4C0E0 000074 (v01 COREv4 COREBOOT 00000000 CORE 20200925)
[ 0.060905] ACPI: FACP 0x000000005FE4DBC0 000114 (v06 COREv4 COREBOOT 00000000 CORE 20200925)
[ 0.069398] ACPI: DSDT 0x000000005FE4C280 00193A (v02 COREv4 COREBOOT 00010001 INTL 20200925)
[ 0.077890] ACPI: FACS 0x000000005FE4C240 000040
[ 0.082483] ACPI: FACS 0x000000005FE4C240 000040
[ 0.087077] ACPI: SSDT 0x000000005FE4DCE0 00008A (v02 COREv4 COREBOOT 0000002A CORE 20200925)
[ 0.095570] ACPI: MCFG 0x000000005FE4DD70 00003C (v01 COREv4 COREBOOT 00000000 CORE 20200925)
[ 0.104064] ACPI: APIC 0x000000005FE4DDB0 000062 (v03 COREv4 COREBOOT 00000000 CORE 20200925)
[ 0.112557] ACPI: HPET 0x000000005FE4DE20 000038 (v01 COREv4 COREBOOT 00000000 CORE 20200925)
[ 0.121051] ACPI: HEST 0x000000005FE4DE60 0001D0 (v01 COREv4 COREBOOT 00000000 CORE 20200925)
[ 0.129544] ACPI: IVRS 0x000000005FE4E030 000070 (v02 AMD AMDIOMMU 00000001 AMD 00000000)
[ 0.138037] ACPI: SSDT 0x000000005FE4E0A0 00051F (v02 AMD ALIB 00000001 MSFT 04000000)
[ 0.146531] ACPI: SSDT 0x000000005FE4E5C0 0006B2 (v01 AMD POWERNOW 00000001 AMD 00000001)
[ 0.155025] ACPI: VFCT 0x000000005FE4EC80 00F269 (v01 COREv4 COREBOOT 00000000 CORE 20200925)
[ 0.163517] ACPI: Reserving FACP table memory at [mem 0x5fe4dbc0-0x5fe4dcd3]
[ 0.170537] ACPI: Reserving DSDT table memory at [mem 0x5fe4c280-0x5fe4dbb9]
[ 0.177558] ACPI: Reserving FACS table memory at [mem 0x5fe4c240-0x5fe4c27f]
[ 0.184578] ACPI: Reserving FACS table memory at [mem 0x5fe4c240-0x5fe4c27f]
[ 0.191598] ACPI: Reserving SSDT table memory at [mem 0x5fe4dce0-0x5fe4dd69]
[ 0.198619] ACPI: Reserving MCFG table memory at [mem 0x5fe4dd70-0x5fe4ddab]
[ 0.205638] ACPI: Reserving APIC table memory at [mem 0x5fe4ddb0-0x5fe4de11]
[ 0.212659] ACPI: Reserving HPET table memory at [mem 0x5fe4de20-0x5fe4de57]
[ 0.219679] ACPI: Reserving HEST table memory at [mem 0x5fe4de60-0x5fe4e02f]
[ 0.226699] ACPI: Reserving IVRS table memory at [mem 0x5fe4e030-0x5fe4e09f]
[ 0.233719] ACPI: Reserving SSDT table memory at [mem 0x5fe4e0a0-0x5fe4e5be]
[ 0.240740] ACPI: Reserving SSDT table memory at [mem 0x5fe4e5c0-0x5fe4ec71]
[ 0.247760] ACPI: Reserving VFCT table memory at [mem 0x5fe4ec80-0x5fe5dee8]
[ 0.254835] No NUMA configuration found
[ 0.258593] Faking a node at [mem 0x0000000000000000-0x000000017effffff]
[ 0.265273] NODE_DATA(0) allocated [mem 0x17efe7000-0x17effdfff]
[ 0.283316] Zone ranges:
[ 0.285678] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.291830] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
[ 0.297984] Normal [mem 0x0000000100000000-0x000000017effffff]
[ 0.304138] Device empty
[ 0.306998] Movable zone start for each node
[ 0.311245] Early memory node ranges
[ 0.314798] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.321039] node 0: [mem 0x0000000000100000-0x000000005fe45fff]
[ 0.327278] node 0: [mem 0x0000000100000000-0x000000017effffff]
[ 0.333520] Initmem setup node 0 [mem 0x0000000000001000-0x000000017effffff]
[ 0.340544] On node 0, zone DMA: 1 pages in unavailable ranges
[ 0.340602] On node 0, zone DMA: 97 pages in unavailable ranges
[ 0.359000] On node 0, zone Normal: 442 pages in unavailable ranges
[ 0.364808] On node 0, zone Normal: 4096 pages in unavailable ranges
[ 0.371106] ACPI: PM-Timer IO Port: 0x818
[ 0.381304] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[ 0.387198] IOAPIC[0]: apic_id 4, version 33, address 0xfec00000, GSI 0-23
[ 0.394039] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.400367] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.406868] ACPI: Using ACPI (MADT) for SMP configuration information
[ 0.413280] ACPI: HPET id: 0x10228210 base: 0xfed00000
[ 0.418398] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.423333] smpboot: smpboot: XXX end of prefill_possible_map
[ 0.429053] After prefill_possible_map
[ 0.432781] After init_cpu_to_node
[ 0.436160] After init_gi_nodes
[ 0.439281] After io_apic_init_mappings
[ 0.443094] After x86_init.hyper.guest_late_init
[ 0.447696] [mem 0x80000000-0xf7ffffff] available for PCI devices
[ 0.453754] After e820
[ 0.456096] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[ 0.470502] After unwind_init
[ 0.473298] After setup_arch
[ 0.476169] After setup_command_line
[ 0.479711] After setup_nr_cpu_ids
[ 0.483091] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:2 nr_node_ids:1
[ 0.491127] percpu: Embedded 54 pages/cpu s182040 r8192 d30952 u1048576
[ 0.497579] pcpu-alloc: s182040 r8192 d30952 u1048576 alloc=1*2097152
[ 0.503978] pcpu-alloc: [0] 0 1
[ 0.507209] After setup_per_cpu_areas
[ 0.510826] After smp_perpare_boot_cpu
[ 0.514553] After boot_cpu_hotplug_init
[ 0.518368] Fallback order for Node 0: 0
[ 0.522352] Built 1 zonelists, mobility grouping on. Total pages: 898444
[ 0.529113] Policy zone: Normal
[ 0.532233] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.16.0-rc7-00106-gcc498e0c43be root=/dev/sda3 rw debug noisapnp cryptomgr.notests ipv6.disable_ipv6=1 selinux=0 console=ttyS0,115200 console=tty1 earlyprintk=serial,ttyS0,115200,keep
[ 0.553561] Unknown kernel command line parameters "noisapnp BOOT_IMAGE=/boot/vmlinuz-5.16.0-rc7-00106-gcc498e0c43be", will be passed to user space.
[ 0.567513] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[ 0.575640] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[ 0.583229] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.629839] Memory: 3483292K/3651472K available (14344K kernel code, 2321K rwdata, 4212K rodata, 1692K init, 6332K bss, 167920K reserved, 0K cma-reserved)
[ 0.643901] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.650282] After mm_init
[ 0.652850] ftrace: allocating 35324 entries in 138 pages
[ 0.670177] ftrace: allocated 138 pages with 3 groups
[ 0.675169] Dynamic Preempt: full
[ 0.678348] After sched_init
[ 0.681282] rcu: Preemptible hierarchical RCU implementation.
[ 0.686928] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=2.
[ 0.693515] Trampoline variant of Tasks RCU enabled.
[ 0.698541] Rude variant of Tasks RCU enabled.
[ 0.703048] Tracing variant of Tasks RCU enabled.
[ 0.707815] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[ 0.715441] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 0.722125] After rcu_init
[ 0.734046] NR_IRQS: 4352, nr_irqs: 440, preallocated irqs: 16
[ 0.740046] rcu: Offload RCU callbacks from CPUs: (none).
[ 0.745386] random: crng_init_try_arch_early failed with i = 4, X86_FEATURE_RDRAND = no
[ 0.745388] random: crng_init_try_arch_early failed with i = 5, X86_FEATURE_RDRAND = no
[ 0.753324] random: crng_init_try_arch_early failed with i = 6, X86_FEATURE_RDRAND = no
[ 0.761299] random: crng_init_try_arch_early failed with i = 7, X86_FEATURE_RDRAND = no
[ 0.769272] random: crng_init_try_arch_early failed with i = 8, X86_FEATURE_RDRAND = no
[ 0.777245] random: crng_init_try_arch_early failed with i = 9, X86_FEATURE_RDRAND = no
[ 0.785218] random: crng_init_try_arch_early failed with i = 10, X86_FEATURE_RDRAND = no
[ 0.793192] random: crng_init_try_arch_early failed with i = 11, X86_FEATURE_RDRAND = no
[ 0.801252] random: crng_init_try_arch_early failed with i = 12, X86_FEATURE_RDRAND = no
[ 0.809313] random: crng_init_try_arch_early failed with i = 13, X86_FEATURE_RDRAND = no
[ 0.817372] random: crng_init_try_arch_early failed with i = 14, X86_FEATURE_RDRAND = no
[ 0.825432] random: crng_init_try_arch_early failed with i = 15, X86_FEATURE_RDRAND = no
[ 0.833494] After add_latent_entropy
[ 0.845109] After add_device_randomness
[ 0.848921] After boot_init_stack_canary
[ 0.852875] spurious 8259A interrupt: IRQ7.
[ 0.854860] Console: colour VGA+ 80x25
[ 0.866354] printk: console [tty1] enabled
[ 0.870342] ACPI: Core revision 20210930
[ 0.874423] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[ 0.883411] APIC: Switch to symmetric I/O mode setup
[ 0.923446] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.933411] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x7070070e77e, max_idle_ns: 881591209168 ns
[ 0.943779] Calibrating delay loop (skipped), value calculated using timer frequency.. 7800.35 BogoMIPS (lpj=3900178)
[ 0.944776] pid_max: default: 32768 minimum: 301
[ 0.945884] LSM: Security Framework initializing
[ 0.946890] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[ 0.947791] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
Poking KASLR using RDTSC...
[ 0.952654] Bit 30 in CPUID ECX not set.
[ 0.952681] Last level iTLB entries: 4KB 512, 2MB 1024, 4MB 512
[ 0.953775] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 512, 1GB 0
[ 0.954780] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.955776] Spectre V2 : Mitigation: Full AMD retpoline
[ 0.956775] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 0.957776] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.958776] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
[ 0.963298] Freeing SMP alternatives memory: 40K
[ 0.963777] After check_bugs
[ 0.964776] After acpi_subsystem_init
[ 0.965776] After arch_post_acpi_subsys_init
[ 0.966776] After rcu_scheduler_starting
[ 0.967851] After find_task_by_pid_ns and PF_NO_SETAFFINITY
[ 0.968781] After numa_default_policy
[ 0.969801] After rcu_read_lock
[ 0.970775] After rcu_read_unlock
[ 0.971776] After kthreadd_done
[ 0.972786] smpboot: Start of smp_prepare_cpus_common
[ 0.973777] smpboot: smpboot: zalloc 0
[ 0.974776] smpboot: smpboot: zalloc 1
[ 0.975775] smpboot: smpboot: After set_sched_topology()
[ 0.976777] smpboot: smpboot: After smp_sanity_check()
[ 0.977775] smpboot: smpboot: Before x86_init.timers.setup_percpu_clockev()
[ 0.997775] random: random: 1
[ 0.998775] random: random: 2
[ 0.998775] random: random: 3
[ 0.998775] random: random: 4
[ 1.061775] random: random: 1
[ 1.062775] random: random: 2
[ 1.062775] random: random: 3
[ 1.062775] random: random: 4
[ 1.062808] APIC calibration not consistent with PM-Timer: 102ms instead of 100ms
[ 1.063775] APIC delta adjusted to PM-Timer: 625036 (640760)
[ 1.063780] smpboot: smpboot: After x86_init.timers.setup_percpu_clockev()
[ 1.064775] smpboot: smp_get_logical_apicid()
[ 1.065775] smpboot: CPU0: AMD A6-6400K APU with Radeon(tm) HD Graphics (family: 0x15, model: 0x13, stepping: 0x1)
[ 1.067103] Performance Events: Fam15h core perfctr, AMD PMU driver.
[ 1.067777] ... version: 0
[ 1.068775] ... bit width: 48
[ 1.069775] ... generic registers: 6
[ 1.070777] ... value mask: 0000ffffffffffff
[ 1.071775] ... max period: 00007fffffffffff
[ 1.072775] ... fixed-purpose events: 0
[ 1.073775] ... event mask: 000000000000003f
[ 1.075812] rcu: Hierarchical SRCU implementation.
[ 1.078397] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[ 1.078879] smp: Bringing up secondary CPUs ...
[ 1.080950] x86: Booting SMP configuration: