Re: [PATCH 47/61] KVM: x86: Squash CPUID 0x2.0 insanity for modern CPUs

Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> · Mon, 24 Feb 2020 23:35:00 +0100

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:

> Rework CPUID 0x2.0 to be a normal CPUID leaf if it returns "01" in AL,
> i.e. EAX & 0xff.
>
> Long ago, Intel documented CPUID 0x2.0 as being a stateful leaf, e.g. a
> version of the SDM circa 1995 states:
>
>   The least-significant byte in register EAX (register AL) indicates the
>   number of times the CPUID instruction must be executed with an input
>   value of 2 to get a complete description of the processors's caches
>   and TLBs.  The Pentium Pro family of processors will return a 1.
>
> A 2000 version of the SDM only updated the paragraph to reference
> Intel's new processory family:
>
>   The first member of the family of Pentium 4 processors will return a 1.
>
> Fast forward to the present, and Intel's SDM now states:
>
>   The least-significant byte in register EAX (register AL) will always
>   return 01H.  Software should ignore this value and not interpret it as
>   an information descriptor.
>
> AMD's APM simply states that CPUID 0x2 is reserved.
>
> Given that CPUID itself was introduced in the Pentium, odds are good
> that the only Intel CPU family that *maybe* implemented a stateful CPUID
> was the P5.  Which obviously did not support VMX, or KVM.
>
> In other words, KVM's emulation of a stateful CPUID 0x2.0 has likely
> been dead code from the day it was introduced.  This is backed up by
> commit 0fdf8e59faa5c ("KVM: Fix cpuid iteration on multiple leaves per
> eac"), whichs show that the stateful iteration code was completely
> broken when it was introduced by commit 0771671749b59 ("KVM: Enhance
> guest cpuid management"), i.e. not actually tested.
>
> Although it's _extremely_ tempting to yank KVM's stateful code, leave it
> in for now but annotate all its code paths as "unlikely".  The code is
> relatively contained, and if by some miracle there is someone running KVM
> on a CPU with a stateful CPUID 0x2, more power to 'em.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> ---
>  arch/x86/kvm/cpuid.c | 31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 47f61f4497fb..ab2a34337588 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -405,9 +405,6 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
>  		    &entry->eax, &entry->ebx, &entry->ecx, &entry->edx);
>  
>  	switch (function) {
> -	case 2:
> -		entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC;
> -		break;
>  	case 4:
>  	case 7:
>  	case 0xb:
> @@ -483,17 +480,31 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>  		 * it since we emulate x2apic in software */
>  		cpuid_entry_set(entry, X86_FEATURE_X2APIC);
>  		break;
> -	/* function 2 entries are STATEFUL. That is, repeated cpuid commands
> -	 * may return different values. This forces us to get_cpu() before
> -	 * issuing the first command, and also to emulate this annoying behavior
> -	 * in kvm_emulate_cpuid() using KVM_CPUID_FLAG_STATE_READ_NEXT */
>  	case 2:
> +		/*
> +		 * On ancient CPUs, function 2 entries are STATEFUL.  That is,
> +		 * CPUID(function=2, index=0) may return different results each
> +		 * time, with the least-significant byte in EAX enumerating the
> +		 * number of times software should do CPUID(2, 0).
> +		 *
> +		 * Modern CPUs (quite likely every CPU KVM has *ever* run on)
> +		 * are less idiotic.  Intel's SDM states that EAX & 0xff "will
> +		 * always return 01H. Software should ignore this value and not
> +		 * interpret it as an informational descriptor", while AMD's
> +		 * APM states that CPUID(2) is reserved.
> +		 */
> +		max_idx = entry->eax & 0xff;
> +		if (likely(max_idx <= 1))
> +			break;
> +
> +		entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC;
>  		entry->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
>  
> -		for (i = 1, max_idx = entry->eax & 0xff; i < max_idx; ++i) {
> +		for (i = 1; i < max_idx; ++i) {
>  			entry = do_host_cpuid(array, 2, 0);
>  			if (!entry)
>  				goto out;
> +			entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC;
>  		}
>  		break;
>  	/* functions 4 and 0x8000001d have additional index. */
> @@ -903,7 +914,7 @@ static int is_matching_cpuid_entry(struct kvm_cpuid_entry2 *e,
>  		return 0;
>  	if ((e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) && e->index != index)
>  		return 0;
> -	if ((e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) &&
> +	if (unlikely(e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) &&
>  	    !(e->flags & KVM_CPUID_FLAG_STATE_READ_NEXT))
>  		return 0;
>  	return 1;
> @@ -920,7 +931,7 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
>  
>  		e = &vcpu->arch.cpuid_entries[i];
>  		if (is_matching_cpuid_entry(e, function, index)) {
> -			if (e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC)
> +			if (unlikely(e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC))
>  				move_to_next_stateful_cpuid_entry(vcpu, i);
>  			best = e;
>  			break;

Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>

but your history digging results make me think that killing the whole
'statefulness' thing is not a bad idea at all :-)

-- 
Vitaly