On Tue, Feb 07, 2023 at 11:04:27PM +0000, Usama Arif wrote: > Tested on v7, doing INIT/SIPI/SIPI in parallel brings down the time for > smpboot from ~700ms to 100ms (85% improvement) on a server with 128 CPUs > split across 2 NUMA nodes. > > The major change over v6 is keeping parallel smp support enabled in AMD. > APIC ID for parallel CPU bringup is now obtained from CPUID leaf 0x0B > (for x2APIC mode) otherwise CPUID leaf 0x1 (8 bits). > > The patch for reusing timer calibration for secondary CPUs is also removed > from the series as its not part of parallel smp bringup and needs to be > further thought about. Running rcutorture on this got me the following NULL pointer dereference on scenario TREE01: ------------------------------------------------------------------------ [ 34.662066] smpboot: CPU 0 is now offline [ 34.674075] rcu: NOCB: Cannot CB-offload offline CPU 25 [ 35.038003] rcu: De-offloading 5 [ 35.112997] rcu: Offloading 12 [ 35.716011] smpboot: Booting Node 0 Processor 0 APIC 0x0 [ 35.762685] BUG: kernel NULL pointer dereference, address: 0000000000000001 [ 35.764278] #PF: supervisor instruction fetch in kernel mode [ 35.765530] #PF: error_code(0x0010) - not-present page [ 35.766700] PGD 0 P4D 0 [ 35.767278] Oops: 0010 [#1] PREEMPT SMP PTI [ 35.768223] CPU: 36 PID: 0 Comm: swapper/36 Not tainted 6.2.0-rc1-00206-g18a37610b632-dirty #3563 [ 35.770201] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 ------------------------------------------------------------------------ Given an x86 system with KVM and qemu, this can be reproduced by running the following from the top-level directory in the Linux-kernel source tree: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs "TREE01 TINY01" --trust-make Out of 15 runs, 14 blew up just after the first attempt to bring CPU 0 back online. The 15th run blew up just after the second attempt to bring CPU 0 online, the first attempt having succeeded. My guess is that the CONFIG_BOOTPARAM_HOTPLUG_CPU0=y Kconfig option is tickling this bug. This Kconfig option has been added to the TREE01 scenario in the -rcu tree's "dev" branch, which might mean that this test would pass on mainline. But CONFIG_BOOTPARAM_HOTPLUG_CPU0=y is not new, only rcutorture's testing of it. Thoughts? Thanx, Paul