The single line summary is now out of date - there's no new masking. On Thursday, 2022-02-17 at 02:30:29 -03, Leonardo Bras wrote: > During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel > swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). > > When xsave feature is available, the fpu swap is done by: > - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used > to store the current state of the fpu registers to a buffer. > - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & > XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. > > For xsave(s) the mask is used to limit what parts of the fpu regs will > be copied to the buffer. Likewise on xrstor(s), the mask is used to > limit what parts of the fpu regs will be changed. > > The mask for xsave(s), the guest's fpstate->xfeatures, is defined on > kvm_arch_vcpu_create(), which (in summary) sets it to all features > supported by the cpu which are enabled on kernel config. > > This means that xsave(s) will save to guest buffer all the fpu regs > contents the cpu has enabled when the guest is paused, even if they > are not used. > > This would not be an issue, if xrstor(s) would also do that. > > xrstor(s)'s mask for host/guest swap is basically every valid feature > contained in kernel config, except XFEATURE_MASK_PKRU. > Accordingto kernel src, it is instead switched in switch_to() and > flush_thread(). > > Then, the following happens with a host supporting PKRU starts a > guest that does not support it: > 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest, > 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU) > 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU) > 4 - guest runs, then switch back to host, > 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU) > 6 - xrstor(s) host fpstate to fpu regs. > 7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with > XFEATURE_MASK_PKRU, which should not be supported by guest vcpu) > > On 5, even though the guest does not support PKRU, it does have the flag > set on guest fpstate, which is transferred to userspace via vcpu ioctl > KVM_GET_XSAVE. > > This becomes a problem when the user decides on migrating the above guest > to another machine that does not support PKRU: > The new host restores guest's fpu regs to as they were before (xrstor(s)), > but since the new host don't support PKRU, a general-protection exception > ocurs in xrstor(s) and that crashes the guest. > > This can be solved by making the guest's fpstate->user_xfeatures hold > a copy of guest_supported_xcr0. This way, on 7 the only flags copied to > userspace will be the ones compatible to guest requirements, and thus > there will be no issue during migration. > > As a bonus, it will also fail if userspace tries to set fpu features > that are not compatible to the guest configuration. (KVM_SET_XSAVE ioctl) > > Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures, > there is not need to set it in kvm_check_cpuid(). So, change > fpstate_realloc() so it does not touch fpstate->user_xfeatures if a > non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid() > calls it. > > Signed-off-by: Leonardo Bras <leobras@xxxxxxxxxx> > --- > arch/x86/kernel/fpu/xstate.c | 5 ++++- > arch/x86/kvm/cpuid.c | 2 ++ > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > index 02b3ddaf4f75..7c7824ae7862 100644 > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -1558,7 +1558,10 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize, > fpregs_restore_userregs(); > > newfps->xfeatures = curfps->xfeatures | xfeatures; > - newfps->user_xfeatures = curfps->user_xfeatures | xfeatures; > + > + if (!guest_fpu) > + newfps->user_xfeatures = curfps->user_xfeatures | xfeatures; > + > newfps->xfd = curfps->xfd & ~xfeatures; > > /* Do the final updates within the locked region */ > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 494d4d351859..71125291c578 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -296,6 +296,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > vcpu->arch.guest_supported_xcr0 = > cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); > > + vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0; > + > kvm_update_pv_runtime(vcpu); > > vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); dme. -- All those lines and circles, to me, a mystery.