Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes: > Rework CPUID 0x2.0 to be a normal CPUID leaf if it returns "01" in AL, > i.e. EAX & 0xff. > > Long ago, Intel documented CPUID 0x2.0 as being a stateful leaf, e.g. a > version of the SDM circa 1995 states: > > The least-significant byte in register EAX (register AL) indicates the > number of times the CPUID instruction must be executed with an input > value of 2 to get a complete description of the processors's caches > and TLBs. The Pentium Pro family of processors will return a 1. > > A 2000 version of the SDM only updated the paragraph to reference > Intel's new processory family: > > The first member of the family of Pentium 4 processors will return a 1. > > Fast forward to the present, and Intel's SDM now states: > > The least-significant byte in register EAX (register AL) will always > return 01H. Software should ignore this value and not interpret it as > an information descriptor. > > AMD's APM simply states that CPUID 0x2 is reserved. > > Given that CPUID itself was introduced in the Pentium, odds are good > that the only Intel CPU family that *maybe* implemented a stateful CPUID > was the P5. Which obviously did not support VMX, or KVM. > > In other words, KVM's emulation of a stateful CPUID 0x2.0 has likely > been dead code from the day it was introduced. This is backed up by > commit 0fdf8e59faa5c ("KVM: Fix cpuid iteration on multiple leaves per > eac"), whichs show that the stateful iteration code was completely > broken when it was introduced by commit 0771671749b59 ("KVM: Enhance > guest cpuid management"), i.e. not actually tested. > > Although it's _extremely_ tempting to yank KVM's stateful code, leave it > in for now but annotate all its code paths as "unlikely". The code is > relatively contained, and if by some miracle there is someone running KVM > on a CPU with a stateful CPUID 0x2, more power to 'em. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > --- > arch/x86/kvm/cpuid.c | 31 +++++++++++++++++++++---------- > 1 file changed, 21 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 47f61f4497fb..ab2a34337588 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -405,9 +405,6 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array, > &entry->eax, &entry->ebx, &entry->ecx, &entry->edx); > > switch (function) { > - case 2: > - entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; > - break; > case 4: > case 7: > case 0xb: > @@ -483,17 +480,31 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) > * it since we emulate x2apic in software */ > cpuid_entry_set(entry, X86_FEATURE_X2APIC); > break; > - /* function 2 entries are STATEFUL. That is, repeated cpuid commands > - * may return different values. This forces us to get_cpu() before > - * issuing the first command, and also to emulate this annoying behavior > - * in kvm_emulate_cpuid() using KVM_CPUID_FLAG_STATE_READ_NEXT */ > case 2: > + /* > + * On ancient CPUs, function 2 entries are STATEFUL. That is, > + * CPUID(function=2, index=0) may return different results each > + * time, with the least-significant byte in EAX enumerating the > + * number of times software should do CPUID(2, 0). > + * > + * Modern CPUs (quite likely every CPU KVM has *ever* run on) > + * are less idiotic. Intel's SDM states that EAX & 0xff "will > + * always return 01H. Software should ignore this value and not > + * interpret it as an informational descriptor", while AMD's > + * APM states that CPUID(2) is reserved. > + */ > + max_idx = entry->eax & 0xff; > + if (likely(max_idx <= 1)) > + break; > + > + entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; > entry->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; > > - for (i = 1, max_idx = entry->eax & 0xff; i < max_idx; ++i) { > + for (i = 1; i < max_idx; ++i) { > entry = do_host_cpuid(array, 2, 0); > if (!entry) > goto out; > + entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC; > } > break; > /* functions 4 and 0x8000001d have additional index. */ > @@ -903,7 +914,7 @@ static int is_matching_cpuid_entry(struct kvm_cpuid_entry2 *e, > return 0; > if ((e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) && e->index != index) > return 0; > - if ((e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) && > + if (unlikely(e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) && > !(e->flags & KVM_CPUID_FLAG_STATE_READ_NEXT)) > return 0; > return 1; > @@ -920,7 +931,7 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, > > e = &vcpu->arch.cpuid_entries[i]; > if (is_matching_cpuid_entry(e, function, index)) { > - if (e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) > + if (unlikely(e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC)) > move_to_next_stateful_cpuid_entry(vcpu, i); > best = e; > break; Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> but your history digging results make me think that killing the whole 'statefulness' thing is not a bad idea at all :-) -- Vitaly