On Thu, Oct 27, 2022 at 2:21 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > Passing the host topology to the guest is almost certainly wrong > and will confuse the scheduler. In addition, several fields of > these CPUID leaves vary on each processor; it is simply impossible to > return the right values from KVM_GET_SUPPORTED_CPUID in such a way that > they can be passed to KVM_SET_CPUID2. > > The values that will most likely prevent confusion are all zeroes. > Userspace will have to override it anyway if it wishes to present a > specific topology to the guest. > > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- > Documentation/virt/kvm/api.rst | 14 ++++++++++++++ > arch/x86/kvm/cpuid.c | 32 ++++++++++++++++---------------- > 2 files changed, 30 insertions(+), 16 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index eee9f857a986..20f4f6b302ff 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -8249,6 +8249,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID`` > It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel > has enabled in-kernel emulation of the local APIC. > > +CPU topology > +~~~~~~~~~~~~ > + > +Several CPUID values include topology information for the host CPU: > +0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems. Different > +versions of KVM return different values for this information and userspace > +should not rely on it. Currently they return all zeroes. > + > +If userspace wishes to set up a guest topology, it should be careful that > +the values of these three leaves differ for each CPU. In particular, > +the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX > +for 0x8000001e; the latter also encodes the core id and node id in bits > +7:0 of EBX and ECX respectively. > + > Obsolete ioctls and capabilities > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 0810e93cbedc..164bfb7e7a16 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -759,16 +759,22 @@ struct kvm_cpuid_array { > int nent; > }; > > +static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array) > +{ > + if (array->nent >= array->maxnent) > + return NULL; > + > + return &array->entries[array->nent++]; > +} > + > static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array, > u32 function, u32 index) > { > - struct kvm_cpuid_entry2 *entry; > + struct kvm_cpuid_entry2 *entry = get_next_cpuid(array); > > - if (array->nent >= array->maxnent) > + if (!entry) > return NULL; > > - entry = &array->entries[array->nent++]; > - > memset(entry, 0, sizeof(*entry)); > entry->function = function; > entry->index = index; > @@ -945,22 +951,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) > entry->edx = edx.full; > break; > } > - /* > - * Per Intel's SDM, the 0x1f is a superset of 0xb, > - * thus they can be handled by common code. > - */ > case 0x1f: > case 0xb: > /* > - * Populate entries until the level type (ECX[15:8]) of the > - * previous entry is zero. Note, CPUID EAX.{0x1f,0xb}.0 is > - * the starting entry, filled by the primary do_host_cpuid(). > + * No topology; a valid topology is indicated by the presence > + * of subleaf 1. > */ > - for (i = 1; entry->ecx & 0xff00; ++i) { > - entry = do_host_cpuid(array, function, i); > - if (!entry) > - goto out; > - } > + entry->eax = entry->ebx = entry->ecx = 0; > break; > case 0xd: { > u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm(); > @@ -1193,6 +1190,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) > entry->ebx = entry->ecx = entry->edx = 0; > break; > case 0x8000001e: > + /* Do not return host topology information. */ > + entry->eax = entry->ebx = entry->ecx = 0; > + entry->edx = 0; /* reserved */ > break; > case 0x8000001F: > if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) { > -- > 2.31.1 > This is a userspace ABI change that breaks existing hypervisors. Please don't do this. Userspace ABIs are supposed to be inviolate.