On 26 February 2023 20:59:17 GMT, Usama Arif <usama.arif@xxxxxxxxxxxxx> wrote: > > >On 26/02/2023 18:31, Oleksandr Natalenko wrote: >> Hello. >> >> On neděle 26. února 2023 12:07:51 CET Usama Arif wrote: >>> The main code change over v11 is the build error fix by Brian Gerst and >>> acquiring tr_lock in trampoline_64.S whenever the stack is setup. >>> >>> The git history is also rewritten to move the commits that removed >>> initial_stack, early_gdt_descr and initial_gs earlier in the patchset. >>> >>> Thanks, >>> Usama >>> >>> Changes across versions: >>> v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more >>> v3: Clean up x2apic patch, add MTRR optimisation, lock topology update >>> in preparation for more parallelisation. >>> v4: Fixes to the real mode parallelisation patch spotted by SeanC, to >>> avoid scribbling on initial_gs in common_cpu_up(), and to allow all >>> 24 bits of the physical X2APIC ID to be used. That patch still needs >>> a Signed-off-by from its original author, who once claimed not to >>> remember writing it at all. But now we've fixed it, hopefully he'll >>> admit it now :) >>> v5: rebase to v6.1 and remeasure performance, disable parallel bringup >>> for AMD CPUs. >>> v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and >>> reused timer calibration for secondary CPUs. >>> v7: [David Woodhouse] iterate over all possible CPUs to find any existing >>> cluster mask in alloc_clustermask. (patch 1/9) >>> Keep parallel AMD support enabled in AMD, using APIC ID in CPUID leaf >>> 0x0B (for x2APIC mode) or CPUID leaf 0x01 where 8 bits are sufficient. >>> Included sanity checks for APIC id from 0x0B. (patch 6/9) >>> Removed patch for reusing timer calibration for secondary CPUs. >>> commit message and code improvements. >>> v8: Fix CPU0 hotplug by setting up the initial_gs, initial_stack and >>> early_gdt_descr. >>> Drop trampoline lock and bail if APIC ID not found in find_cpunr. >>> Code comments improved and debug prints added. >>> v9: Drop patch to avoid repeated saves of MTRR at boot time. >>> rebased and retested at v6.2-rc8. >>> added kernel doc for no_parallel_bringup and made do_parallel_bringup >>> __ro_after_init. >>> v10: Fixed suspend/resume not working with parallel smpboot. >>> rebased and retested to 6.2. >>> fixed checkpatch errors. >>> v11: Added patches from Brian Gerst to remove the global variables initial_gs, >>> initial_stack, and early_gdt_descr from the 64-bit boot code >>> (https://lore.kernel.org/all/20230222221301.245890-1-brgerst@xxxxxxxxx/). >>> v12: Fixed compilation errors, acquire tr_lock for every stack setup in >>> trampoline_64.S. >>> Rearranged commits for a cleaner git history. >>> >>> Brian Gerst (3): >>> x86/smpboot: Remove initial_stack on 64-bit >>> x86/smpboot: Remove early_gdt_descr on 64-bit >>> x86/smpboot: Remove initial_gs >>> >>> David Woodhouse (8): >>> x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel >>> cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h> >>> cpu/hotplug: Add dynamic parallel bringup states before >>> CPUHP_BRINGUP_CPU >>> x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() >>> x86/smpboot: Split up native_cpu_up into separate phases and document >>> them >>> x86/smpboot: Support parallel startup of secondary CPUs >>> x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel >>> x86/smpboot: Serialize topology updates for secondary bringup >>> >>> .../admin-guide/kernel-parameters.txt | 3 + >>> arch/x86/include/asm/processor.h | 6 +- >>> arch/x86/include/asm/realmode.h | 4 +- >>> arch/x86/include/asm/smp.h | 15 +- >>> arch/x86/include/asm/topology.h | 2 - >>> arch/x86/kernel/acpi/sleep.c | 15 +- >>> arch/x86/kernel/apic/apic.c | 2 +- >>> arch/x86/kernel/apic/x2apic_cluster.c | 126 ++++--- >>> arch/x86/kernel/asm-offsets.c | 1 + >>> arch/x86/kernel/cpu/common.c | 6 +- >>> arch/x86/kernel/head_64.S | 129 +++++-- >>> arch/x86/kernel/smpboot.c | 350 +++++++++++++----- >>> arch/x86/realmode/init.c | 3 + >>> arch/x86/realmode/rm/trampoline_64.S | 27 +- >>> arch/x86/xen/smp_pv.c | 4 +- >>> arch/x86/xen/xen-head.S | 2 +- >>> include/linux/cpuhotplug.h | 2 + >>> include/linux/smpboot.h | 7 + >>> kernel/cpu.c | 31 +- >>> kernel/smpboot.h | 2 - >>> 20 files changed, 537 insertions(+), 200 deletions(-) >> >> With `CONFIG_FORCE_NR_CPUS=y` this results in: >> >> ``` >> ld: vmlinux.o: in function `secondary_startup_64_no_verify': >> (.head.text+0x10c): undefined reference to `nr_cpu_ids' >> ``` >> >> That's because in `arch/x86/kernel/head_64.S` `secondary_startup_64_no_verify()` refers to `nr_cpu_ids` under `#ifdef CONFIG_SMP`, but this symbol is available under the following conditions: >> >> ``` >> 38 #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS) >> 39 #define nr_cpu_ids ((unsigned int)NR_CPUS) >> 40 #else >> 41 extern unsigned int nr_cpu_ids; >> 42 #endif >> >> 1090 #if (NR_CPUS > 1) && !defined(CONFIG_FORCE_NR_CPUS) >> 1091 /* Setup number of possible processor ids */ >> 1092 unsigned int nr_cpu_ids __read_mostly = NR_CPUS; >> 1093 EXPORT_SYMBOL(nr_cpu_ids); >> 1094 #endif >> ``` >> >> So having `CONFIG_SMP=y` and, for instance, `CONFIG_NR_CPUS=320`, it is possible to compile out `EXPORT_SYMBOL(nr_cpu_ids);` if `CONFIG_FORCE_NR_CPUS=y` is set. >> > >I think something like below diff should work in all scenarios? I'd've changed the asm side to use the constant limit.