> On May 31, 2023, at 12:30 PM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, May 31, 2023, Jon Kohler wrote: >> >>> On May 30, 2023, at 6:22 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote: >>> >>> On 5/30/23 13:01, Jon Kohler wrote: >>> Is that the only problem? kvm_load_guest_xsave_state() seems to have >>> some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't >>> imagine that KVM guests can even use PKRU if this code is compiled out. > > ... > >>>> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c >>>> index 0bab497c9436..211ef82b53e3 100644 >>>> --- a/arch/x86/kernel/fpu/xstate.c >>>> +++ b/arch/x86/kernel/fpu/xstate.c >>>> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size) >>>> unsigned short cid = xsave_cpuid_features[i]; >>>> >>>> /* Careful: X86_FEATURE_FPU is 0! */ >>>> - if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid)) >>>> + if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || >>>> + DISABLED_MASK_BIT_SET(cid)) >>>> fpu_kernel_cfg.max_features &= ~BIT_ULL(i); >>>> } >>> >>> I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than >>> using DISABLED_MASK_BIT_SET() directly. > > +1, xstate.c uses cpu_feature_enabled() all over the place, and IMO effectively > open coding cpu_feature_enabled() yields less intuitive code. Ack, thank you for the feedback > > And on the KVM side, we can and should replace the #ifdef with cpu_feature_enabled() > (I'll post a patch), as modern compilers are clever enough to completely optimize > out the code when CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n. At that point, using > cpu_feature_enabled() in both KVM and xstate.c will provide a nice bit of symmetry. Ok, thanks for helping to clean that up, I appreciate it. > > Caveat #1: cpu_feature_enabled() has a flaw that's relevant to this code: in the > unlikely scenario that the compiler doesn't resolve "cid" to a compile-time > constant value, cpu_feature_enabled() won't query DISABLED_MASK_BIT_SET(). I don't > see any other use of cpu_feature_enabled() without a hardcoded X86_FEATURE_*, and > the below compiles with my config, so I think/hope we can just require a compile-time > constant when using cpu_feature_enabled(). > Yea I think that should work. I’ll club that into v2 of this patch. > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h > index ce0c8f7d3218..886200fbf8d9 100644 > --- a/arch/x86/include/asm/cpufeature.h > +++ b/arch/x86/include/asm/cpufeature.h > @@ -141,8 +141,11 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; > * supporting a possible guest feature where host support for it > * is not relevant. > */ > -#define cpu_feature_enabled(bit) \ > - (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit)) > +#define cpu_feature_enabled(bit) \ > +({ \ > + BUILD_BUG_ON(!__builtin_constant_p(bit)); \ > + DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit); \ > +}) > > #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) > > Caveat #2: Using cpu_feature_enabled() could subtly break KVM, as KVM advertises > support for features based on boot_cpu_data. E.g. if a feature were disabled by > Kconfig but present in hardware, KVM would allow the guest to use the feature > without properly context switching the data. PKU isn't problematic because KVM > explicitly gates PKU on boot_cpu_has(X86_FEATURE_OSPKE), but KVM learned that > lesson the hard way (see commit c469268cd523, "KVM: x86: block guest protection > keys unless the host has them enabled"). Exposing a feature that's disabled in > the host isn't completely absurd, e.g. KVM already effectively does this for MPX. > The only reason using cpu_feature_enabled() wouldn't be problematic for MPX is > because there's no longer a Kconfig for MPX. > > I'm totally ok gating xfeature bits on cpu_feature_enabled(), but there should be > a prep patch for KVM to clear features bits in kvm_cpu_caps if the corresponding > XCR0/XSS bit is not set in the host. If KVM ever wants to expose an xstate feature > (other than MPX) that's disabled in the host, then we can revisit > fpu__init_system_xstate(). But we need to ensure the "failure" mode is that > KVM doesn't advertise the feature, as opposed to advertising a feature without > without context switching its data. Looking into this, trying to understand the comment about a feature being used without context switching its data. In __kvm_x86_vendor_init() we’re already populating host_xcr0 using the XCR_XFEATURE_ENABLED_MASK, which should be populated on boot by fpu__init_cpu_xstate(), which happens almost immediately after the code that I modified in this commit. That then flows into guest_supported_xcr0 (as well as user_xfeatures). guest_supported_xcr0 is then plumbed into __kvm_set_xcr, which specifically says that we’re using that to prevent the guest from setting bits that we won’t save in the first place. /* * Do not allow the guest to set bits that we do not support * saving. However, xcr0 bit 0 is always set, even if the * emulated CPU does not support XSAVE (see kvm_vcpu_reset()). */ valid_bits = vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FP; Wouldn’t this mean that the *guest* xstate initialization would not see a given feature too and take care of the problem naturally? Or are you saying you’d want an even more detailed clearing? > >>> But, I guess this probably also isn't a big deal for _most_ people. Any >>> sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS >>> since it's pretty widespread on modern CPUs and works across Intel and >>> AMD now. >> >> Ack, I’m using PKU as the key example here, but looking forward this is more of a >> correctness thing than anything else. If for any reason, any xsave feature is disabled >> In the way that PKU is disabled, it will slip thru the cracks. > > I'd be careful about billing this as a correctness thing. See above. Fair enough. I’ll see about simplifying the commit msg when we get thru the comments on this thread.