Sean, On Tue, Nov 16 2021 at 16:05, Sean Christopherson wrote: > On Tue, Nov 16, 2021, Thomas Gleixner wrote: >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 2686f2edb47c..9425fdbb4806 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -9576,6 +9576,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >> vcpu->arch.last_vmentry_cpu = vcpu->cpu; >> vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); >> >> + kvm_update_guest_xfd_state(); > > Is there a reason the XFD switch can't key off TIF_NEED_FPU_LOAD a la the other > FPU stuff? I.e. piggyback this snippet in vcpu_enter_guest(): TIF_NEED_FPU_LOAD is not set here. > if (test_thread_flag(TIF_NEED_FPU_LOAD)) > switch_fpu_return(); Assume guest has control of XFD and XFD writes are not trapped. That means on vmexit the XFD state of the guest is unknown. vcpu_run() kvm_load_guest_fpu() wrmsrl(XFD, guest_fpstate->xfd); XRSTORS do { local_irq_disable(); // Covers the case of softirq usage and preemption if (test_thread_flag(TIF_NEED_FPU_LOAD)) switch_fpu_return() wrmsrl(XFD, guest_fpstate->xfd); do { vmenter(); // Guest modifies XFD } while (reenter); local_irq_enable(); <- Problem starts here preempt_enable(); <- Becomes wider here } while (!breakout); kvm_put_guest_fpu(); // Switch back to user FPU state So we have the following cases: guest_fpstate.xfd XFD at vmexit 0 0 // consistent state 1 0 // inconsistent state 0 1 // inconsistent state 1 1 // consistent state Now assume that after reenabling interrupts a interrupt/softirq happens which uses FPU. It will save the correct state because XFD is still guest state, but the subsequent restore will operate on the stale guest_fpstate.xfd value. Same problem vs schedule after reenabling preemption or if not preempted in kvm_put_guest_fpu() Now you could argue that the interrupt/softirq XSAVES should also read the XFD MSR and save it in guest_fpstate.xfd. Same in schedule() and kvm_put_guest_fpu(), i.e: XSAVES if (fpstate->is_guest) { rdmsrl(XFD, xfd); fpstate->xfd = xfd; __this_cpu_write(..., xfd); } We can do that, but I'm unhappy about this conditional in schedule(). So I was asking for doing a simple KVM only solution first: vcpu_run() kvm_load_guest_fpu() wrmsrl(XFD, guest_fpstate->xfd); XRSTORS do { local_irq_disable(); if (test_thread_flag(TIF_NEED_FPU_LOAD)) switch_fpu_return() wrmsrl(XFD, guest_fpstate->xfd); do { vmenter(); // Guest modifies XFD } while (reenter); update_xfd_state(); // Restore consistency local_irq_enable(); and check how bad that is for KVM in terms of overhead on AMX systems. If it really matters we can look at the conditional in XSAVES path. Thanks, tglx