Folks, this is a follow up to the initial sketch of patches which got picked up by Jing and have been posted in combination with the KVM parts: https://lore.kernel.org/r/20211208000359.2853257-1-yang.zhong@xxxxxxxxx This update is only touching the x86/fpu code and not changing anything on the KVM side. BIG FAT WARNING: This is compile tested only! In course of the dicsussion of the above patchset it turned out that there are a few conceptual issues vs. hardware and software state and also vs. guest restore. This series addresses this with the following changes vs. the original approach: 1) fpstate reallocation is now independent of fpu_swap_kvm_fpstate() It is triggered directly via XSETBV and XFD MSR write emulation which are used both for runtime and restore purposes. For this it provides two wrappers around a common update function, one for XCR0 and one for XFD. Both check the validity of the arguments and the correct sizing of the guest FPU fpstate. If the size is not sufficient, fpstate is reallocated. The functions can fail. 2) XFD synchronization KVM must neither touch the XFD MSR nor the fpstate->xfd software state in order to guarantee state consistency. In the MSR write emulation case the XFD specific update handler has to be invoked. See #1 If MSR write emulation is disabled because the buffer size is sufficient for all use cases, i.e.: guest_fpu::xfeatures == guest_fpu::perm then there is no guarantee that the XFD software state on VMEXIT is the same as the state on VMENTER. A separate synchronization function is provided which reads the XFD MSR and updates the relevant software state. This function has to be invoked after a VMEXIT before reenabling interrupts. With that the KVM logic looks like this: xsetbv_emulate() ret = fpu_update_guest_xcr0(&vcpu->arch.guest_fpu, xcr0); if (ret) handle_fail() .... kvm_emulate_wrmsr() .... case MSR_IA32_XFD: ret = fpu_update_guest_xfd(&vcpu->arch.guest_fpu, vcpu->arch.xcr0, msrval); if (ret) handle_fail() .... This covers both the case of a running vCPU and the case of restore. The XFD synchronization mechanism is only relevant for a running vCPU after VMEXIT when XFD MSR write emulation is disabled: vcpu_run() vcpu_enter_guest() for (;;) { ... vmenter(); ... }; ... if (!xfd_write_emulated(vcpu)) fpu_sync_guest_vmexit_xfd_state(); local_irq_enable(); It has no relevance for the guest restore case. With that all XFD/fpstate related issues should be covered in a consistent way. CPUID validation can be done without exporting yet more FPU functions: if (requested_xfeatures & ~vcpu->arch.guest_fpu.perm) return -ENOPONY; That's the purpose of fpu_guest::perm from the beginning along with fpu_guest::xfeatures for other validation purposes. XFD_ERR MSR handling is completely separate and as discussed a KVM only issue for now. KVM has to ensure that the MSR is 0 before interrupts are enabled. So this is not touched here. The only remaining issue is the KVM XSTATE save/restore size checking which probably requires some FPU core assistance. But that requires some more thoughts vs. the IOCTL interface extension and once that is settled it needs to be solved in one go. But that's an orthogonal issue to the above. The series is also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/people/tglx/devel.git x86/fpu-kvm Thanks, tglx --- include/asm/fpu/api.h | 63 ++++++++++++++++++++++++ include/asm/fpu/types.h | 22 ++++++++ include/uapi/asm/prctl.h | 26 +++++---- kernel/fpu/core.c | 123 ++++++++++++++++++++++++++++++++++++++++++++--- kernel/fpu/xstate.c | 118 +++++++++++++++++++++++++++------------------ kernel/fpu/xstate.h | 20 ++++++- kernel/process.c | 2 7 files changed, 307 insertions(+), 67 deletions(-)