> On 12/10/21 02:00, Thomas Gleixner wrote: > > Swapping the host/guest FPU is directly fiddling with FPU internals > > which requires 5 exports. The upcoming support of dymanically enabled > > states would even need more. > > > > Implement a swap function in the FPU core code and export that instead. > > > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Cc: kvm@xxxxxxxxxxxxxxx > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > --- > > arch/x86/include/asm/fpu/api.h | 8 +++++ > > arch/x86/include/asm/fpu/internal.h | 15 +--------- > > arch/x86/kernel/fpu/core.c | 30 ++++++++++++++++++--- > > arch/x86/kernel/fpu/init.c | 1 > > arch/x86/kernel/fpu/xstate.c | 1 > > arch/x86/kvm/x86.c | 51 +++++++----------------------------- > > arch/x86/mm/extable.c | 2 - > > 7 files changed, 48 insertions(+), 60 deletions(-) > > When looking into the tglx/devel.git x86/fpu for the full #1-#4 series and the KVM AMX support, I'd like to talk two things as follows, 1. KVM dynamic allocation API: Since KVM also uses dynamic allocation, after KVM detects guest requesting AMX by #NM trap, KVM need alloc extra buffer for this vcpu's current->thread.fpu.fpstate and guest_fpu related. So far, the kernel itself has such API like fpstate_realloc(), but it's static. How about making a common function usable for KVM? 2. There exists a case that *guest AMX state can be lost*: After KVM passthrough XFD to guest, when vmexit opening irq window and KVM is interrupted, kernel softirq path can call kernel_fpu_begin() to touch xsave state. This function does XSAVES. If guest XFD[18] is 1, and with guest AMX state in register, then guest AMX state is lost by XSAVES. The detailed example call trace in commit commit 2620fe268e80d667a94553cd37a94ccaa2cb8c83 Author: Sean Christopherson <seanjc@xxxxxxxxxx> Date: Fri Jan 17 11:30:51 2020 -0800 KVM: x86: Revert "KVM: X86: Fix fpu state crash in kvm guest" Reload the current thread's FPU state, which contains the guest's FPU state, to the CPU registers if necessary during vcpu_enter_guest(). TIF_NEED_FPU_LOAD can be set any time control is transferred out of KVM, e.g. if I/O is triggered during a KVM call to get_user_pages() or if a softirq occurs while KVM is scheduled in. ... A sample trace triggered by warning if TIF_NEED_FPU_LOAD is set while vcpu state is loaded: <IRQ> gcmaes_crypt_by_sg.constprop.12+0x26e/0x660 ? 0xffffffffc024547d ? __qdisc_run+0x83/0x510 ? __dev_queue_xmit+0x45e/0x990 ... ? do_IRQ+0x7f/0xd0 ? common_interrupt+0xf/0xf </IRQ> ? irq_entries_start+0x20/0x660 ? vmx_get_interrupt_shadow+0x2f0/0x710 [kvm_intel] ? kvm_set_msr_common+0xfc7/0x2380 [kvm] ? recalibrate_cpu_khz+0x10/0x10 ? ktime_get+0x3a/0xa0 ? kvm_arch_vcpu_ioctl_run+0x107/0x560 [kvm] ? kvm_init+0x6bf/0xd00 [kvm] For this case, I think one way is kernel doing something before XSAVES for KVM thread; another way is let KVM fix: maintaining a zero XFD value (by current->state.fpu.fpstate->xfd = 0) after vcpu fpu state is loaded and restore real guest XFD value before vmenter. Logic as follows. after vmexit: if XFD is passthrough then sync guest XFD to vmx->xfd; set XFD to current->state.fpu.fpstate->xfd (= 0) __this_cpu_write(xfd_state, 0); before vmenter (irq is disabled): if passthrough then restore to real guest XFD by vmx->xfd; vcpu_run: (if XFD is passthrough) load: swap from qemu's to a zero XFD put: swap zero to qemu's Thanks, Jing [...]