Re: [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 8 Jul 2024 13:53:03 -0700

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > Add a macro for use in kvm_set_cpu_caps() to automagically initialize
> > features that KVM wants to support based solely on the CPU's capabilities,
> > e.g. KVM advertises LA57 support if it's available in hardware, even if
> > the host kernel isn't utilizing 57-bit virtual addresses.
> > 
> > Take advantage of the fact that kvm_cpu_cap_mask() adjusts kvm_cpu_caps
> > based on raw CPUID, i.e. will clear features bits that aren't supported in
> > hardware, and simply force-set the capability before applying the mask.
> > 
> > Abusing kvm_cpu_cap_set() is a borderline evil shenanigan, but doing so
> > avoid extra CPUID lookups, and a future commit will harden the entire
> > family of *F() macros to assert (at compile time) that every feature being
> > allowed is part of the capability word being processed, i.e. using a macro
> > will bring more advantages in the future.
> 
> Could you explain what do you mean by "extra CPUID lookups"?

cpuid_ecx(7) incurs a CPUID to read the raw info, on top of the CPUID that is
executed by kvm_cpu_cap_init() (kvm_cpu_cap_mask() as of this patch).  Obviously
not a big deal, but it's an extra VM-Exit when running as a VM.

> > +/*
> > + * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> > + * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> > + * Simply force set the feature in KVM's capabilities, raw CPUID support will
> > + * be factored in by kvm_cpu_cap_mask().
> > + */
> > +#define RAW_F(name)						\
> > +({								\
> > +	kvm_cpu_cap_set(X86_FEATURE_##name);			\
> > +	F(name);						\
> > +})
> > +
> >  /*
> >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> >   * doesn't care about the CPIUD index because the index of the function in
> > @@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void)
> >  		F(AVX512VL));
> >  
> >  	kvm_cpu_cap_mask(CPUID_7_ECX,
> > -		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> > +		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> >  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> >  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> >  		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
> >  		F(SGX_LC) | F(BUS_LOCK_DETECT)
> >  	);
> > -	/* Set LA57 based on hardware capability. */
> > -	if (cpuid_ecx(7) & F(LA57))
> > -		kvm_cpu_cap_set(X86_FEATURE_LA57);
> >  
> >  	/*
> >  	 * PKU not yet implemented for shadow paging and requires OSPKE
> 
> Putting a function call into a macro which evaluates into a bitmask is
> somewhat misleading, but let it be...
> 
> IMHO in long term, it might be better to rip the whole huge 'or'ed mess, and replace
> it with a list of statements, along with comments for all unusual cases.

As in something like this?

	kvm_cpu_cap_init(AVX512VBMI);
	kvm_cpu_cap_init_raw(LA57);
	kvm_cpu_cap_init(PKU);
	...
	kvm_cpu_cap_init(BUS_LOCK_DETECT);

	kvm_cpu_cap_init_aliased(CPUID_8000_0001_EDX, FPU);

	...

	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX1);
	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX2);
	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX_EDECCSSA);

The tricky parts are incorporating raw CPUID into the masking and handling features
that KVM _doesn't_ support.  For raw CPUID, we could simply do CPUID every time, or
pre-fill an array to avoid hundreds of CPUIDs that are largely redudant.

But I don't see a way to mask off unsupported features without losing the
compile-time protections that the current code provides.  And even if we took a
big hammer approach, e.g. finalized masking for all words at the very end, we'd
still need to carry state across each statement, i.e. we'd still need the bitwise-OR
and mask  behavior, it would just be buried in helpers/macros.

I suspect the generated code will be larger, but I doubt that will actually be
problematic.  The written code will also be more verbose (roughly 4x since we
tend to squeeze 4 features per line), and it will be harder to ensure initialization
of features in a given word are all co-located.

I definitely don't hate the idea, but I don't think it will be a clear "win" either.
Unless someone feels strongly about pursuing this approach, I'll add to the "things
to explore later" list.