On Fri, 2023-11-24 at 00:53 -0500, Yang Weijiang wrote: > From: Sean Christopherson <seanjc@xxxxxxxxxx> > > When granting userspace or a KVM guest access to an xfeature, preserve the > entity's existing supervisor and software-defined permissions as tracked > by __state_perm, i.e. use __state_perm to track *all* permissions even > though all supported supervisor xfeatures are granted to all FPUs and > FPU_GUEST_PERM_LOCKED disallows changing permissions. > > Effectively clobbering supervisor permissions results in inconsistent > behavior, as xstate_get_group_perm() will report supervisor features for > process that do NOT request access to dynamic user xfeatures, whereas any > and all supervisor features will be absent from the set of permissions for > any process that is granted access to one or more dynamic xfeatures (which > right now means AMX). > > The inconsistency isn't problematic because fpu_xstate_prctl() already > strips out everything except user xfeatures: > > case ARCH_GET_XCOMP_PERM: > /* > * Lockless snapshot as it can also change right after the > * dropping the lock. > */ > permitted = xstate_get_host_group_perm(); > permitted &= XFEATURE_MASK_USER_SUPPORTED; > return put_user(permitted, uptr); > > case ARCH_GET_XCOMP_GUEST_PERM: > permitted = xstate_get_guest_group_perm(); > permitted &= XFEATURE_MASK_USER_SUPPORTED; > return put_user(permitted, uptr); > > and similarly KVM doesn't apply the __state_perm to supervisor states > (kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()): > > case 0xd: { > u64 permitted_xcr0 = kvm_get_filtered_xcr0(); > u64 permitted_xss = kvm_caps.supported_xss; > > But if KVM in particular were to ever change, dropping supervisor > permissions would result in subtle bugs in KVM's reporting of supported > CPUID settings. And the above behavior also means that having supervisor > xfeatures in __state_perm is correctly handled by all users. > > Dropping supervisor permissions also creates another landmine for KVM. If > more dynamic user xfeatures are ever added, requesting access to multiple > xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in the > second invocation of __xstate_request_perm() computing the wrong ksize, as > as the mask passed to xstate_calculate_size() would not contain *any* > supervisor features. > > Commit 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE > permissions") fudged around the size issue for userspace FPUs, but for > reasons unknown skipped guest FPUs. Lack of a fix for KVM "works" only > because KVM doesn't yet support virtualizing features that have supervisor > xfeatures, i.e. as of today, KVM guest FPUs will never need the relevant > xfeatures. > > Simply extending the hack-a-fix for guests would temporarily solve the > ksize issue, but wouldn't address the inconsistency issue and would leave > another lurking pitfall for KVM. KVM support for virtualizing CET will > likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will not > be set in xfeatures_mask_supervisor() and would again be dropped when > granting access to dynamic xfeatures. > > Note, the existing clobbering behavior is rather subtle. The @permitted > parameter to __xstate_request_perm() comes from: > > permitted = xstate_get_group_perm(guest); > > which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm, > where __state_perm is initialized to: > > fpu->perm.__state_perm = fpu_kernel_cfg.default_features; > > and copied to the guest side of things: > > /* Same defaults for guests */ > fpu->guest_perm = fpu->perm; > > fpu_kernel_cfg.default_features contains everything except the dynamic > xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA: > > fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features; > fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC; > > When __xstate_request_perm() restricts the local "mask" variable to > compute the user state size: > > mask &= XFEATURE_MASK_USER_SUPPORTED; > usize = xstate_calculate_size(mask, false); > > it subtly overwrites the target __state_perm with "mask" containing only > user xfeatures: > > perm = guest ? &fpu->guest_perm : &fpu->perm; > /* Pairs with the READ_ONCE() in xstate_get_group_perm() */ > WRITE_ONCE(perm->__state_perm, mask); > > Cc: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > Cc: Weijiang Yang <weijiang.yang@xxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Chao Gao <chao.gao@xxxxxxxxx> > Cc: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > Cc: John Allen <john.allen@xxxxxxx> > Cc: kvm@xxxxxxxxxxxxxxx > Link: https://lore.kernel.org/all/ZTqgzZl-reO1m01I@xxxxxxxxxx > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kernel/fpu/xstate.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > index ef6906107c54..73f6bc00d178 100644 > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -1601,16 +1601,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest) > if ((permitted & requested) == requested) > return 0; > > - /* Calculate the resulting kernel state size */ > + /* > + * Calculate the resulting kernel state size. Note, @permitted also > + * contains supervisor xfeatures even though supervisor are always > + * permitted for kernel and guest FPUs, and never permitted for user > + * FPUs. > + */ > mask = permitted | requested; > - /* Take supervisor states into account on the host */ > - if (!guest) > - mask |= xfeatures_mask_supervisor(); > ksize = xstate_calculate_size(mask, compacted); > > - /* Calculate the resulting user state size */ > - mask &= XFEATURE_MASK_USER_SUPPORTED; > - usize = xstate_calculate_size(mask, false); > + /* > + * Calculate the resulting user state size. Take care not to clobber > + * the supervisor xfeatures in the new mask! > + */ > + usize = xstate_calculate_size(mask & XFEATURE_MASK_USER_SUPPORTED, false); > > if (!guest) { > ret = validate_sigaltstack(usize); Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Best regards, Maxim Levitsky