On Fri, Feb 24, 2023, Aaron Lewis wrote: > Be a good citizen and don't allow any of the supported MPX xfeatures[1] > to be set if they can't all be set. That way userspace or a guest > doesn't fail if it attempts to set them in XCR0. > > [1] CPUID.(EAX=0DH,ECX=0):EAX.BNDREGS[bit-3] > CPUID.(EAX=0DH,ECX=0):EAX.BNDCSR[bit-4] > > Suggested-by: Jim Mattson <jmattson@xxxxxxxxxx> > Signed-off-by: Aaron Lewis <aaronlewis@xxxxxxxxxx> > --- > arch/x86/kvm/cpuid.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index e1165c196970..b2e7407cd114 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -60,9 +60,22 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted) > return ret; > } > > +static u64 sanitize_xcr0(u64 xcr0) > +{ > + u64 mask; > + > + mask = XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR; > + if ((xcr0 & mask) != mask) > + xcr0 &= ~mask; > + > + return xcr0; > +} > + > u64 kvm_permitted_xcr0(void) > { > - return kvm_caps.supported_xcr0 & xstate_get_guest_group_perm(); > + u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm(); > + > + return sanitize_xcr0(permitted_xcr0); Sanitizing XCR0 in this "permitted" helper conflates two separate things (sanity vs. permissions) and overall leads to a really, really confusing mess. E.g. kvm_vcpu_ioctl_x86_set_xsave(), cpuid_get_supported_xcr0(), kvm_x86_dev_get_attr(), and kvm_mpx_supported() all use the non-sanitized value, which appears nonsensical, but is actually correct because the whole "permitted" thing is funky. On a related topic, isn't cpuid_get_supported_xcr0() buggy? Ah, no, the wonderfully named kvm_check_cpuid() ensures fpu_enable_guest_xfd_features() is invoked before the result can actually be consumed. Ugh, and past me created a royal mess with SGX subleaf 0x12.1. KVM shouldn't be manipulating guest CPUID just to ensure userspace doesn't bypass the guest's permitted XCR0 by launching an enclave. I.e. the SGX code that consumes cpuid_get_supported_xcr0() in __kvm_update_cpuid_runtime() should not exists. I'm 99.9% certain we can simply nuke that code without breaking userspace, e.g. QEMU correctly sets the data. LOL, or at least it used to. QEMU commit 301e90675c ("target/i386: Enable support for XSAVES based features") completely broke SGX by using allowed XSS instead of XCR0. Anyways, back to this mess. I was going to say "As I suggested in v2[0], sanitize kvm_caps.supported_xcr0 itself", but after a _lot_ of starting, I realized (or finally remembered) that that doesn't work because supported_xcr0 is already sane, the issue is specifically with xstate_get_guest_group_perm() clearing XTILE_DATA and leaving behind the XTILE_CFG landmine. And for all intents and purposes, supported_xcr0 _must_ be sane since it comes from host_xcr0, which is pulled straight from hardware. So barring a _completely_ busted CPU or hypervisor _and_ a busted kernel, supported_xcr0 itself can never be insane. So not only is my proposal to sanitize supported_xcr0 not viable, the entire idea of generically sanitizing XCR0 is bunk. The _only_ issue is XTILE_CFG, and it's very specifically because of the dynamic feature crud. So rather than add a bunch of logic to sanitize XCR0, which is at best superfluous and at worst misleading (again, only XTILE_CFG can ever be insane), I think we should just special case XTILE_CFG and add a big fat comment explaining why KVM further manipulates the "supported" XCR0. Ah, and that's more or less what Aaron had in v1, but then Jim asked about MPX[1] and things went off the rails (I'm just relieved that I wasn't the one to send you awry and then complain about the result). Aaron and/or Mingwei, can you give the attached patches a spin? Patch 1 is a slightly reworked version of Aaron's patch 1, and patch 2 implements what I just described (guts of patch 2 also pasted below). If things look good, I'll post a v4 of this series. [0] https://lore.kernel.org/all/Y7R36wsXn3JqwfEv@xxxxxxxxxx [1] https://lore.kernel.org/all/CALMp9eQD8EpS50A0iAxsoaW-_yFmWERWw6rbAh9VSEJjDrMkNQ@xxxxxxxxxxxxxx --- arch/x86/kvm/x86.h | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index b6c6988d99b5..ae235bc2b9bc 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -3,6 +3,7 @@ #define ARCH_X86_KVM_X86_H #include <linux/kvm_host.h> +#include <asm/fpu/xstate.h> #include <asm/mce.h> #include <asm/pvclock.h> #include "kvm_cache_regs.h" @@ -325,7 +326,22 @@ extern bool enable_pmu; */ static inline u64 kvm_get_filtered_xcr0(void) { - return kvm_caps.supported_xcr0 & xstate_get_guest_group_perm(); + u64 supported_xcr0 = kvm_caps.supported_xcr0; + + BUILD_BUG_ON(XFEATURE_MASK_USER_DYNAMIC != XFEATURE_MASK_XTILE_DATA); + + if (supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC) { + supported_xcr0 &= xstate_get_guest_group_perm(); + + /* + * Treat XTILE_CFG as unsupported if the current process isn't + * allowed to use XTILE_DATA, as attempting to set XTILE_CFG in + * XCR0 without setting XTILE_DATA is architecturally illegal. + */ + if (!(supported_xcr0 & XFEATURE_MASK_XTILE_DATA)) + supported_xcr0 &= XFEATURE_MASK_XTILE_CFG; + } + return supported_xcr0; } static inline bool kvm_mpx_supported(void) --
>From 0c0a6bdcd8edd04874344dd71c135bb211605c25 Mon Sep 17 00:00:00 2001 From: Aaron Lewis <aaronlewis@xxxxxxxxxx> Date: Fri, 24 Feb 2023 22:36:00 +0000 Subject: [PATCH 1/2] KVM: x86: Add a helper to handle filtering of unpermitted XCR0 features Add a helper, kvm_get_filtered_xcr0(), to dedup code that needs to account for XCR0 features that require explicit opt-in on a per-process basis. In addition to documenting when KVM should/shouldn't consult xstate_get_guest_group_perm(), the helper will also allow sanitizing the filtered XCR0 to avoid enumerating architecturally illegal XCR0 values, e.g. XTILE_CFG without XTILE_DATA. No functional changes intended. Signed-off-by: Aaron Lewis <aaronlewis@xxxxxxxxxx> Reviewed-by: Mingwei Zhang <mizhang@xxxxxxxxxx> Link: https://lore.kernel.org/r/20230224223607.1580880-2-aaronlewis@xxxxxxxxxx [sean: rename helper, move to x86.h, massage changelog] Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/x86.c | 4 +--- arch/x86/kvm/x86.h | 13 +++++++++++++ 3 files changed, 15 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 1ad3bde72526..98a46e46ee9e 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1002,7 +1002,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) entry->eax = entry->ebx = entry->ecx = 0; break; case 0xd: { - u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm(); + u64 permitted_xcr0 = kvm_get_filtered_xcr0(); u64 permitted_xss = kvm_caps.supported_xss; entry->eax &= permitted_xcr0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3fab192862d4..0e5a0078bb4d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4547,9 +4547,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = 0; break; case KVM_CAP_XSAVE2: { - u64 guest_perm = xstate_get_guest_group_perm(); - - r = xstate_required_size(kvm_caps.supported_xcr0 & guest_perm, false); + r = xstate_required_size(kvm_get_filtered_xcr0(), false); if (r < sizeof(struct kvm_xsave)) r = sizeof(struct kvm_xsave); break; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 203fb6640b5b..b6c6988d99b5 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -315,6 +315,19 @@ extern struct kvm_caps kvm_caps; extern bool enable_pmu; +/* + * Get a filtered version of KVM's supported XCR0 that strips out dynamic + * features for which the current process doesn't (yet) have permission to use. + * This is intended to be used only when enumerating support to userspace, + * e.g. in KVM_GET_SUPPORTED_CPUID and KVM_CAP_XSAVE2, it does NOT need to be + * used to check/restrict guest behavior as KVM rejects KVM_SET_CPUID{2} if + * userspace attempts to enable unpermitted features. + */ +static inline u64 kvm_get_filtered_xcr0(void) +{ + return kvm_caps.supported_xcr0 & xstate_get_guest_group_perm(); +} + static inline bool kvm_mpx_supported(void) { return (kvm_caps.supported_xcr0 & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) base-commit: 68f7c82ab1b8c7057b0c241907ff7906c7407e6d -- 2.40.0.348.gf938b09366-goog
>From 39775081bd99ef8dc8b38852f5dad82ca216de5e Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc@xxxxxxxxxx> Date: Thu, 23 Mar 2023 12:52:29 -0700 Subject: [PATCH 2/2] KVM: x86: Filter out XTILE_CFG if XTILE_DATA isn't permitted Filter out XTILE_CFG from the supported XCR0 reported to userspace if the current process doesn't have access to XTILE_DATA. Attempting to set XTILE_CFG in XCR0 will #GP if XTILE_DATA is also not set, and so keeping XTILE_CFG as supported results in xplosions if userspace feeds KVM_GET_SUPPORTED_CPUID back into KVM and the guest doesn't sanity check CPUID. Fixes: 445ecdf79be0 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID") Reported-by: Aaron Lewis <aaronlewis@xxxxxxxxxx> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/x86.h | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index b6c6988d99b5..ae235bc2b9bc 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -3,6 +3,7 @@ #define ARCH_X86_KVM_X86_H #include <linux/kvm_host.h> +#include <asm/fpu/xstate.h> #include <asm/mce.h> #include <asm/pvclock.h> #include "kvm_cache_regs.h" @@ -325,7 +326,22 @@ extern bool enable_pmu; */ static inline u64 kvm_get_filtered_xcr0(void) { - return kvm_caps.supported_xcr0 & xstate_get_guest_group_perm(); + u64 supported_xcr0 = kvm_caps.supported_xcr0; + + BUILD_BUG_ON(XFEATURE_MASK_USER_DYNAMIC != XFEATURE_MASK_XTILE_DATA); + + if (supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC) { + supported_xcr0 &= xstate_get_guest_group_perm(); + + /* + * Treat XTILE_CFG as unsupported if the current process isn't + * allowed to use XTILE_DATA, as attempting to set XTILE_CFG in + * XCR0 without setting XTILE_DATA is architecturally illegal. + */ + if (!(supported_xcr0 & XFEATURE_MASK_XTILE_DATA)) + supported_xcr0 &= XFEATURE_MASK_XTILE_CFG; + } + return supported_xcr0; } static inline bool kvm_mpx_supported(void) -- 2.40.0.348.gf938b09366-goog