On Fri, Nov 10, 2023 at 11:08:06AM -0600, Babu Moger wrote: > Date: Fri, 10 Nov 2023 11:08:06 -0600 > From: Babu Moger <babu.moger@xxxxxxx> > Subject: [PATCH] target/i386: Fix CPUID encoding of Fn8000001E_ECX > X-Mailer: git-send-email 2.34.1 > > Observed the following failure while booting the SEV-SNP guest and the > guest fails to boot with the smp parameters: > "-smp 192,sockets=1,dies=12,cores=8,threads=2". > > qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 'Invalid parameter' > qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x8000001e, index: 0x0. > provided: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000b00, edx: 0x00000000 > expected: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000300, edx: 0x00000000 > qemu-system-x86_64: SEV-SNP: failed update CPUID page > > Reason for the failure is due to overflowing of bits used for "Node per > processor" in CPUID Fn8000001E_ECX. This field's width is 3 bits wide and > can hold maximum value 0x7. With dies=12 (0xB), it overflows and spills > over into the reserved bits. In the case of SEV-SNP, this causes CPUID > enforcement failure and guest fails to boot. > > The PPR documentation for CPUID_Fn8000001E_ECX [Node Identifiers] > ================================================================= > Bits Description > 31:11 Reserved. > > 10:8 NodesPerProcessor: Node per processor. Read-only. > ValidValues: > Value Description > 0h 1 node per processor. > 7h-1h Reserved. > > 7:0 NodeId: Node ID. Read-only. Reset: Fixed,XXh. > ================================================================= > > As in the spec, the valid value for "node per processor" is 0 and rest > are reserved. > > Looking back at the history of decoding of CPUID_Fn8000001E_ECX, noticed > that there were cases where "node per processor" can be more than 1. It > is valid only for pre-F17h (pre-EPYC) architectures. For EPYC or later > CPUs, the linux kernel does not use this information to build the L3 > topology. > > Also noted that the CPUID Function 0x8000001E_ECX is available only when > TOPOEXT feature is enabled. One additional query, such dependency relationship is not reflected in encode_topo_cpuid8000001e(), should TOPOEXT be checked in encode_topo_cpuid8000001e()? > This feature is enabled only for EPYC(F17h) > or later processors. So, previous generation of processors do not not > enumerate 0x8000001E_ECX leaf. > > There could be some corner cases where the older guests could enable the > TOPOEXT feature by running with -cpu host, in which case legacy guests > might notice the topology change. To address those cases introduced a > new CPU property "legacy-multi-node". It will be true for older machine > types to maintain compatibility. By default, it will be false, so new > decoding will be used going forward. > > The documentation is taken from Preliminary Processor Programming > Reference (PPR) for AMD Family 19h Model 11h, Revision B1 Processors 55901 > Rev 0.25 - Oct 6, 2022. > > Cc: qemu-stable@xxxxxxxxxx > Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 > Signed-off-by: Babu Moger <babu.moger@xxxxxxx> > --- [snip] > +++ b/target/i386/cpu.h > @@ -1988,6 +1988,7 @@ struct ArchCPU { > * If true present the old cache topology information > */ > bool legacy_cache; > + bool legacy_multi_node; This property deserves a comment, as does legacy_cache above. > > /* Compatibility bits for old machine types: */ > bool enable_cpuid_0xb; > -- > 2.34.1 > Just the above nit, otherwise, Reviewed-by: Zhao Liu <zhao1.liu@xxxxxxxxx>