On Mon, 2024-07-08 at 13:53 -0700, Sean Christopherson wrote: > On Thu, Jul 04, 2024, Maxim Levitsky wrote: > > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote: > > > Add a macro for use in kvm_set_cpu_caps() to automagically initialize > > > features that KVM wants to support based solely on the CPU's capabilities, > > > e.g. KVM advertises LA57 support if it's available in hardware, even if > > > the host kernel isn't utilizing 57-bit virtual addresses. > > > > > > Take advantage of the fact that kvm_cpu_cap_mask() adjusts kvm_cpu_caps > > > based on raw CPUID, i.e. will clear features bits that aren't supported in > > > hardware, and simply force-set the capability before applying the mask. > > > > > > Abusing kvm_cpu_cap_set() is a borderline evil shenanigan, but doing so > > > avoid extra CPUID lookups, and a future commit will harden the entire > > > family of *F() macros to assert (at compile time) that every feature being > > > allowed is part of the capability word being processed, i.e. using a macro > > > will bring more advantages in the future. > > > > Could you explain what do you mean by "extra CPUID lookups"? > > cpuid_ecx(7) incurs a CPUID to read the raw info, on top of the CPUID that is > executed by kvm_cpu_cap_init() (kvm_cpu_cap_mask() as of this patch). Obviously > not a big deal, but it's an extra VM-Exit when running as a VM. > > > > +/* > > > + * Raw Feature - For features that KVM supports based purely on raw host CPUID, > > > + * i.e. that KVM virtualizes even if the host kernel doesn't use the feature. > > > + * Simply force set the feature in KVM's capabilities, raw CPUID support will > > > + * be factored in by kvm_cpu_cap_mask(). > > > + */ > > > +#define RAW_F(name) \ > > > +({ \ > > > + kvm_cpu_cap_set(X86_FEATURE_##name); \ > > > + F(name); \ > > > +}) > > > + > > > /* > > > * Magic value used by KVM when querying userspace-provided CPUID entries and > > > * doesn't care about the CPIUD index because the index of the function in > > > @@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void) > > > F(AVX512VL)); > > > > > > kvm_cpu_cap_mask(CPUID_7_ECX, > > > - F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) | > > > + F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) | > > > F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) | > > > F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) | > > > F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ | > > > F(SGX_LC) | F(BUS_LOCK_DETECT) > > > ); > > > - /* Set LA57 based on hardware capability. */ > > > - if (cpuid_ecx(7) & F(LA57)) > > > - kvm_cpu_cap_set(X86_FEATURE_LA57); > > > > > > /* > > > * PKU not yet implemented for shadow paging and requires OSPKE > > > > Putting a function call into a macro which evaluates into a bitmask is > > somewhat misleading, but let it be... > > > > IMHO in long term, it might be better to rip the whole huge 'or'ed mess, and replace > > it with a list of statements, along with comments for all unusual cases. > > As in something like this? > > kvm_cpu_cap_init(AVX512VBMI); > kvm_cpu_cap_init_raw(LA57); > kvm_cpu_cap_init(PKU); > ... > kvm_cpu_cap_init(BUS_LOCK_DETECT); > > kvm_cpu_cap_init_aliased(CPUID_8000_0001_EDX, FPU); > > ... > > kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX1); > kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX2); > kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX_EDECCSSA); > > The tricky parts are incorporating raw CPUID into the masking and handling features > that KVM _doesn't_ support. For raw CPUID, we could simply do CPUID every time, or > pre-fill an array to avoid hundreds of CPUIDs that are largely redudant. In terms of performance, again this code is run once per kvm module load, so even if it does something truly gross performance wise, it's not a problem, even if run nested. > > But I don't see a way to mask off unsupported features without losing the > compile-time protections that the current code provides. And even if we took a > big hammer approach, e.g. finalized masking for all words at the very end, we'd > still need to carry state across each statement, i.e. we'd still need the bitwise-OR > and mask behavior, it would just be buried in helpers/macros. Can you elaborate on this? For example let's say this: kvm_cpu_cap_init(CPUID_7_0_EBX, F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ | F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) | F(AVX512VL)); This will be replaced with: kvm_cpu_cap_clear_all(CPUID_7_0_EBX); kvm_cpu_cap_init(FSGSBASE); kvm_cpu_cap_init(TSC_ADJUST, CAP_EMULATED); .. kvm_cpu_cap_init(AVX512VL); Then each 'kvm_cpu_cap_init' will opt-in to set a bit if supported in host cpuid, or always opt-in for emulated features, etc.... Host CPUID can indeed be cached, if extra host cpuid queries cause too slow (e.g 1 second) delay when nested. > > I suspect the generated code will be larger, but I doubt that will actually be > problematic. Yes, 100% agree. > The written code will also be more verbose (roughly 4x since we > tend to squeeze 4 features per line) It will be about as long as the list of macros in the cpufeatures.h, where all features are nicely ordered by cpuid leaves. In this case I consider verbose long code to be an improvement. IMHO the OR'ed mask of macros is just too terse, hard to parse. It was borderline OK, before this patch series because it only contained features, but now it also contains various modifiers, IMHO it's just hard to notice that EMUL_F at that corner... > , and it will be harder to ensure initialization > of features in a given word are all co-located. Actually co-location won't be needed. We can first copy the caps from boot_cpu_data, then zero all the leaves that we initialize ourselves. After that we can initialize opt-in features in any order - it will still be sorted by CPUID leaves but even if the order is broken (e.g due to cherry-pick or something), it won't cause any issues. > > I definitely don't hate the idea, but I don't think it will be a clear "win" either. > Unless someone feels strongly about pursuing this approach, I'll add to the "things > to explore later" list. > Please do consider this, I am almost sure that whoever will need to read this code later (could be you...), will thank you. Best regards, Maxim Levitsky