> From: Sean Christopherson <seanjc@xxxxxxxxxx> > Sent: Thursday, December 30, 2021 12:55 AM > > On Wed, Dec 29, 2021, Yang Zhong wrote: > > From: Jing Liu <jing2.liu@xxxxxxxxx> > > > > Guest xstate permissions should be set by userspace VMM before vcpu > > creation. Extend KVM_SET_CPUID2 to verify that every feature reported > > in CPUID[0xD] has proper permission set. If permission allows, enable > > all xfeatures in guest cpuid with fpstate buffer sized accordingly. > > > > This avoids introducing new KVM exit reason for reporting permission > > violation to userspace VMM at run-time and also removes the need of > > tricky fpstate buffer expansion in the emulation and restore path of > > XCR0 and IA32_XFD MSR. > > How so? __do_cpuid_func() restricts what is advertised to userspace based > on > xstate_get_guest_group_perm(), so it's not like KVM is advertising something > it > can't provide? There should never be any danger to KVM that's mitigated by > restricing guest CPUID because KVM can and should check vcpu- > >arch.guest_fpu.perm > instead of guest CPUID. Well, above explains why we choose to expand fpstate buffer at KVM_SET_CPUID2 instead of in the emulation path when required permissions have been set, as discussed here: https://lore.kernel.org/all/20211214024948.048572883@xxxxxxxxxxxxx/ > > In other words, I believe you're conflating the overall approach of requiring > userspace to pre-acquire the necessary permissions with enforcing what > userspace > advertises to the guest. > > > Signed-off-by: Jing Liu <jing2.liu@xxxxxxxxx> > > Signed-off-by: Kevin Tian <kevin.tian@xxxxxxxxx> > > Signed-off-by: Yang Zhong <yang.zhong@xxxxxxxxx> > > --- > > arch/x86/kvm/cpuid.c | 62 +++++++++++++++++++++++++++++++++---------- > - > > 1 file changed, 47 insertions(+), 15 deletions(-) > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > index 4855344091b8..acbc10db550e 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -81,9 +81,12 @@ static inline struct kvm_cpuid_entry2 > *cpuid_entry2_find( > > return NULL; > > } > > > > -static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent) > > +static int kvm_check_cpuid(struct kvm_vcpu *vcpu, > > + struct kvm_cpuid_entry2 *entries, > > + int nent) > > { > > struct kvm_cpuid_entry2 *best; > > + int r = 0; > > > > /* > > * The existing code assumes virtual address is 48-bit or 57-bit in the > > @@ -93,11 +96,40 @@ static int kvm_check_cpuid(struct > kvm_cpuid_entry2 *entries, int nent) > > if (best) { > > int vaddr_bits = (best->eax & 0xff00) >> 8; > > > > - if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0) > > - return -EINVAL; > > + if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0) { > > + r = -EINVAL; > > + goto out; > > Please don't change this to a goto, a return is perfectly ok and more readable > as it doesn't imply there's some functional change that needs to be unwound > at > the end. will fix > > > + } > > } > > > > - return 0; > > + /* > > + * Check guest permissions for dynamically-enabled xfeatures. > > + * Userspace VMM is expected to acquire permission before vCPU > > + * creation. If permission allows, enable all xfeatures with > > + * fpstate buffer sized accordingly. This avoids complexity of > > + * run-time expansion in the emulation and restore path of XCR0 > > + * and IA32_XFD MSR. > > + */ > > + best = cpuid_entry2_find(entries, nent, 0xd, 0); > > + if (best) { > > + u64 xfeatures; > > + > > + xfeatures = best->eax | ((u64)best->edx << 32); > > + if (xfeatures & ~vcpu->arch.guest_fpu.perm) { > > + r = -ENXIO; > > ENXIO is a rather odd error code for insufficient permissions, especially since > the FPU returns -EPERM for what is effectively the same check. > > > + goto out; > > + } > > + > > + if (xfeatures != vcpu->arch.guest_fpu.xfeatures) { > > xfeatures is obviously not consumed anywhere, which is super confusing and > arguably wrong, e.g. if userspace advertises xfeatures that are a subset of > vcpu->arch.guest_fpu.perm, this will expand XSAVE state beyond what > userspace > actually wants to advertise to the guest. The really confusing case would be > if > userspace reduced xfeatures relative to vcpu->arch.guest_fpu.xfeatures and > got > an -ENOMEM due to the FPU failing to expand the XSAVE size. You are right. > > I don't care about the waste of memory, and IIUC userspace would have to > intentionally request permissions for the guest that it then ignores, but that > doesn't make the code any less confusing. And as written, this check also > prevents > advertising non-XFD features that are not supported in hardware. I doubt > there's > a production use case for that (though MPX deprecation comes close), but > I've > certainly exposed unsupported features to a guest for testing purposes. > > Rather than bleed details from the FPU into KVM, why not have the FPU do > any and > all checks? That also gives the FPU access to requested xfeatures so that it > can opportunistically avoid unnecessary expansion. We can also tweak the > kernel > APIs to be more particular about input values. All above makes sense, especially when we combine permission check and buffer expansion in one step now. > > At that point, I would be ok with fpu_update_guest_perm_features() > rejecting > attempts to advertise features that are not permitted, because then it's an > FPU > policy, not a KVM policy, and there's a valid reason for said policy. It's a bit > of a pedantic distinction, but to me it matters because having KVM explicitly > restrict guest CPUID implies that doing so is necessary for KVM correctness, > which > AFAICT is not the case. > > E.g. in KVM > > /* > * Exposing dynamic xfeatures to the guest requires additional > enabling > * in the FPU, e.g. to expand the guest XSAVE state size. > */ > best = cpuid_entry2_find(entries, nent, 0xd, 0); > if (!best) > return 0; > > xfeatures = best->eax | ((u64)best->edx << 32); > xfeatures &= XFEATURE_MASK_USER_DYNAMIC; > if (!xfeatures) > return 0; > > return fpu_enable_guest_xfd_features(&vcpu->arch.guest_fpu, > xfeatures); > > and then > > int fpu_enable_guest_xfd_features(struct fpu_guest *guest_fpu, u64 > xfeatures) > { > lockdep_assert_preemption_enabled(); > > /* Nothing to do if all requested features are already enabled. */ > xfeatures &= ~guest_fpu->xfeatures; > if (!xfeatures) > return 0; > > /* Dynamic xfeatures are not supported with 32-bit kernels. */ > if (!IS_ENABLED(CONFIG_X86_64)) > return -EPERM; > > return __xfd_enable_feature(xfeatures, guest_fpu); > } > > with > > int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu) > { > struct fpu_state_perm *perm; > unsigned int ksize, usize; > struct fpu *fpu; > > if (WARN_ON_ONCE(!xfd_err || (xfd_err & > ~XFEATURE_MASK_USER_DYNAMIC))) > return 0; Currently this is done as: int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu) { u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC; ... if (!xfd_event) { if (!guest_fpu) pr_err_once("XFD: Invalid xfd error: %016llx\n", xfd_err); return 0; } ... } is it necessary to convert the error print to WARN_ON() (and also apply to guest_fpu)? > > ... > } > > which addresses several things: > > a) avoids explicitly restricing guest CPUID in KVM, and in particular doesn't > prevent userspace from advertising non-XFD features that aren't > supported in > hardware, which for better or worse is allowed today. > > b) returns -EPERM instead of '0' when userspace attempts to enable > dynamic > xfeatures with 32-bit kernels, which isn't a bug as posted only because > KVM pre-checks vcpu->arch.guest_fpu.perm. > > b) avoids reading guest_perm outside of siglock, which was technically a > TOCTOU > "bug", though it didn't put the kernel at risk because > __xstate_request_perm() > doesn't allow reducing permissions. > > c) allows __xfd_enable_feature() to require the caller to provide just XFD > features > All the sample code looks good to me, except WARN_ON() introduced in the last bullet. If no objection from other maintainers, we can incorporate it in next version. Thanks Kevin