We have successfully booted a 32-socket, 3840 cpu HPE Sapphire Rapids system with your patchset applied, as found in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1 # cat /sys/kernel/debug/x86/topo/cpus/3839 initial_apicid: ff7 apicid: ff7 pkg_id: 31 die_id: 31 cu_id: 255 core_id: 59 logical_pkg_id: 31 logical_die_id: 31 llc_id: 3968 l2c_id: 4086 amd_node_id: 0 amd_nodes_per_pkg: 0 max_cores: 60 max_die_per_pkg: 1 smp_num_siblings: 2 We were also able to boot a 60-socket, 2160 cpu HPE SuperDome Flex (SkyLake) with the patchset. # cat /sys/kernel/debug/x86/topo/cpus/2159 initial_apicid: ef7 apicid: ef7 pkg_id: 59 die_id: 59 cu_id: 255 core_id: 27 logical_pkg_id: 59 logical_die_id: 59 llc_id: 3776 l2c_id: 3830 amd_node_id: 0 amd_nodes_per_pkg: 0 max_cores: 18 max_die_per_pkg: 1 smp_num_siblings: 2 On Mon, Jul 24, 2023 at 07:43:50PM +0200, Thomas Gleixner wrote: > Hi! > > A recent commit to the CPUID leaf 0xb/0x1f parser made me look deeper at > the way how topology is evaluated. That "fix" is just yet another cure the > sypmtom hack which completely ignores the underlying disaster. > > The way how topology evaluation works is to overwrite the relevant > variables as often as possible. E.g. smp_num_siblings gets overwritten a > gazillion times, which is wrong to begin with. The boot CPU writes it 3 > times, each AP two times. > > What's worse is that this just works by chance on hybrid systems due to the > fact that the existing ones all seem to boot on a P-Core which has > SMT. Would it boot on a E-Core which has no SMT, then parts of the early > topology evaluation including the primary thread mask which is required for > parallel CPU bringup would be completely wrong. Overwriting it later on > with the correct value does not help at all. > > What's wrong today with hybrid already is the number of cores per package. > On an ADL with 8 P-Cores and 8 E-cores the resulting number of cores per > package is evaluated to be 12. Which is not further surprising because the > CPUID 0xb/0x1f parser looks at the number of logical processors at core > level and divides them by the number of SMP siblings. > > 24 / 2 = 12 > > Just that this CPU has obviously 16 cores not 12. > > It's is even clearly documented in the SDM that this is wrong. > > "Bits 15-00: The number of logical processors across all instances of this > domain within the next higher- scoped domain relative to this current > logical processor. (For example, in a processor socket/package comprising > "M" dies of "N" cores each, where each core has "L" processors, the > "die" domain subleaf value of this field would be M*N*L. In an asymmetric > topology this would be the summation of the value across the lower domain > level instances to create each upper domain level instance.) This number > reflects configuration as shipped by Intel. Note, software must not use > this field to enumerate processor topology. > > Software must not use the value of EBX[15:0] to enumerate processor topology > of the system. The value is only intended for display and diagnostic purposes. > The actual number of logical processors available to BIOS/OS/Applications > may be different from the value of EBX[15:0], depending on software and > platform hardware configurations." > > This "_NOT_ to use for topology evaluation" sentence existed even before > hybrid came along and got ignored. The code worked by chance, but with > hybrid all bets are off. The code completely falls apart once CPUID leaf > 0x1f enumerates any topology level between CORE and DIE, but that's not a > suprise. > > The proper thing to do is to actually evaluate the full topology including > the non-present (hotpluggable) CPUs based on the APICIDs which are provided > by the firmware and a proper topology domain level parser. This can exactly > tell the number of physical packages, logical packages etc. _before_ even > booting a single AP. All of that can be evaluated upfront. > > Aside of that there are too many places which do their own topology > evaluation, but there is absolutely no central point which can actually > provide all of that information in a consistent way. This needs to change. > > This series implements another piece towards this: sane CPUID evaluation, > which is done at _one_ place in a proper well defined order instead of > having it sprinkled all over the CPUID evaluation code. > > At the end of this series this is pretty much bug compatible with the > current CPUID evaluation code in respect to the cores per package > evaluation, but it gets rid of overwriting things like smp_num_siblings, > which is now written once, but is still not capable to work correctly on a > hybrid machine which boots from a non SMT core. These things can only be > fixed up in the next step(s). > > When I tried to go further with this I ran into yet another pile of > historical layers of duct tape and haywire with a gazillion of random > variables sprinkled all over the place. That's still work in progress. It > actually works, but the last step which switches over is not yet in a shape > that can be easily reviewed. Stay tuned. > > The series is based on the APIC cleanup series: > > https://lore.kernel.org/lkml/20230724131206.500814398@xxxxxxxxxxxxx > > and also available on top of that from git: > > git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1 > > Thanks, > > tglx > --- > arch/x86/kernel/cpu/topology.c | 168 -------------------- > b/Documentation/arch/x86/topology.rst | 12 - > b/arch/x86/events/amd/core.c | 2 > b/arch/x86/events/amd/uncore.c | 2 > b/arch/x86/events/intel/uncore.c | 2 > b/arch/x86/include/asm/apic.h | 1 > b/arch/x86/include/asm/cacheinfo.h | 3 > b/arch/x86/include/asm/cpuid.h | 32 +++ > b/arch/x86/include/asm/processor.h | 58 ++++-- > b/arch/x86/include/asm/smp.h | 2 > b/arch/x86/include/asm/topology.h | 51 +++++- > b/arch/x86/include/asm/x86_init.h | 2 > b/arch/x86/kernel/amd_nb.c | 8 > b/arch/x86/kernel/apic/apic_flat_64.c | 7 > b/arch/x86/kernel/apic/apic_noop.c | 3 > b/arch/x86/kernel/apic/apic_numachip.c | 11 - > b/arch/x86/kernel/apic/bigsmp_32.c | 6 > b/arch/x86/kernel/apic/local.h | 1 > b/arch/x86/kernel/apic/probe_32.c | 6 > b/arch/x86/kernel/apic/x2apic_cluster.c | 1 > b/arch/x86/kernel/apic/x2apic_phys.c | 6 > b/arch/x86/kernel/apic/x2apic_uv_x.c | 63 +------ > b/arch/x86/kernel/cpu/Makefile | 5 > b/arch/x86/kernel/cpu/amd.c | 156 ------------------ > b/arch/x86/kernel/cpu/cacheinfo.c | 51 ++---- > b/arch/x86/kernel/cpu/centaur.c | 4 > b/arch/x86/kernel/cpu/common.c | 108 +----------- > b/arch/x86/kernel/cpu/cpu.h | 14 + > b/arch/x86/kernel/cpu/debugfs.c | 98 +++++++++++ > b/arch/x86/kernel/cpu/hygon.c | 133 --------------- > b/arch/x86/kernel/cpu/intel.c | 38 ---- > b/arch/x86/kernel/cpu/mce/amd.c | 4 > b/arch/x86/kernel/cpu/mce/apei.c | 4 > b/arch/x86/kernel/cpu/mce/core.c | 4 > b/arch/x86/kernel/cpu/mce/inject.c | 7 > b/arch/x86/kernel/cpu/proc.c | 8 > b/arch/x86/kernel/cpu/topology.h | 51 ++++++ > b/arch/x86/kernel/cpu/topology_amd.c | 179 +++++++++++++++++++++ > b/arch/x86/kernel/cpu/topology_common.c | 233 ++++++++++++++++++++++++++++ > b/arch/x86/kernel/cpu/topology_ext.c | 136 ++++++++++++++++ > b/arch/x86/kernel/cpu/zhaoxin.c | 18 -- > b/arch/x86/kernel/smpboot.c | 64 ++++--- > b/arch/x86/kernel/vsmp_64.c | 13 - > b/arch/x86/mm/amdtopology.c | 35 +--- > b/arch/x86/xen/apic.c | 8 > b/drivers/edac/amd64_edac.c | 4 > b/drivers/edac/mce_amd.c | 4 > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 > b/drivers/hwmon/fam15h_power.c | 7 > b/drivers/scsi/lpfc/lpfc_init.c | 8 > b/drivers/virt/acrn/hsm.c | 2 > 51 files changed, 964 insertions(+), 881 deletions(-) > > >