Cc kvm ml, On Thu, 18 Jul 2019 at 08:08, Thomas Lambertz <mail@xxxxxxxxxxxxxxxxx> wrote: > > Since kernel 5.2, I've been experiencing strange issues in my Windows 10 > QEMU/KVM guest. > Via bisection, I have tracked down that the issue lies in the FPU state > handling changes. > Kernels before 8ff468c29e9a9c3afe9152c10c7b141343270bf3 work great, the > ones afterwards are affected. > Sometimes the state seems to be restored incorrectly in the guest. > > I have managed to reproduce it relatively cleanly, on a linux guest. > (ubuntu-server 18.04, but that should not matter, since it occured on > windows aswell) > > To reproduce the issue, you need prime95 (or mprime), from > https://www.mersenne.org/download/ . > This is just a stress test for the FPU, which helps reproduce the error > much quicker. > > - Run it in the guest as 'Benchmark Only', and choose the '(2) Small > FFTs' torture test. Give it the maximum amount of cores (for me 10). > - On the host, run the same test. To keep my pc usable, I limited it to > 5 cores. I do this to put some pressure on the system. > - repeatedly focus and unfocus the qemu window > > With this config, errors in the guest usually occur within 30 seconds. > Without the refocusing, takes ~5min on average, but the variance of this > time is quite large. > > The error messages are either > "FATAL ERROR: Rounding was ......., expected less than 0.4" > or > "FATAL ERROR: Resulting sum was ....., expexted: ......", > suggesting that something in the calculation has gone wrong. > > On the host, no errors are ever observed! I found it is offended by commit 5f409e20b (x86/fpu: Defer FPU state load until return to userspace) and can only be reproduced when CONFIG_PREEMPT is enabled. Why restore qemu userspace fpu context to hardware before vmentry in the commit? https://lkml.org/lkml/2017/11/14/945 Actually I suspect the commit f775b13eedee2 (x86,kvm: move qemu/guest FPU switching out to vcpu_run) inaccurately save guest fpu state which in xsave area into the qemu userspace fpu buffer. However, Rik replied in https://lkml.org/lkml/2017/11/14/891, "The scheduler will save the guest fpu context when a vCPU thread is preempted, and restore it when it is scheduled back in." But I can't find any scheduler codes do this. In addition, below codes can fix the mprime error warning. (Still not sure it is correct) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 58305cf..18f928e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3306,6 +3306,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_x86_ops->vcpu_load(vcpu, cpu); + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + switch_fpu_return(); + /* Apply any externally detected TSC adjustments (due to suspend) */ if (unlikely(vcpu->arch.tsc_offset_adjustment)) { adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment); @@ -7990,10 +7993,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) trace_kvm_entry(vcpu->vcpu_id); guest_enter_irqoff(); - fpregs_assert_state_consistent(); - if (test_thread_flag(TIF_NEED_FPU_LOAD)) - switch_fpu_return(); - if (unlikely(vcpu->arch.switch_db_regs)) { set_debugreg(0, 7); set_debugreg(vcpu->arch.eff_db[0], 0);