On Tue, Nov 19, 2019 at 12:01:33PM -0800, Sean Christopherson wrote: > On Wed, Oct 30, 2019 at 11:44:09PM -0400, Derek Yerger wrote: > > I noticed the following in the host kernel log around the time the guest > > encountered BSOD on 5.2.7: > > > > [ 337.841491] WARNING: CPU: 6 PID: 7548 at arch/x86/kvm/x86.c:7963 > > kvm_arch_vcpu_ioctl_run+0x19b1/0x1b00 [kvm] > > Rats, I overlooked this first time round. In the future, if you get a > WARN splat, try to make it very obvious in the bug report, they're almost > always a smoking gun. > > That WARN that fired is: > > /* The preempt notifier should have taken care of the FPU already. */ > WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD)); > > which was added part of a bug fix by commit: > > 240c35a3783a ("kvm: x86: Use task structs fpu field for user") > > the buggy commit that was fixed is > > 5f409e20b794 ("x86/fpu: Defer FPU state load until return to userspace") > > which was part of a FPU rewrite that went into 5.2[*]. So yep, big > smoking gun :-) > > My understanding of the WARN is that it means the kernel's FPU state is > unexpectedly loaded when entry to the KVM guest is imminent. As for *how* > the kernel's FPU state is getting loaded, no clue. But, I think it'd be > pretty easy to find the the culprit by adding a debug flag into struct > thread_info that gets set in vcpu_load() and clearing it in vcpu_put(), > and then WARN in set_ti_thread_flag() if the debug flag is true when > TIF_NEED_FPU_LOAD is being set. I'll put together a debugging patch later > today and send it your way. Debug patch attached. Hopefully it finds something, it took me an embarassing number of attempts to get correct, I kept screwing up checking a bit number versus checking a bit mask...
>From 6288031dacbe753b84515d330f62c1f8ed31d932 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> Date: Wed, 20 Nov 2019 10:12:56 -0800 Subject: [PATCH] thread_info: Add a debug hook to detect FPU changes while a vCPU is loaded Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> --- arch/x86/include/asm/thread_info.h | 2 ++ arch/x86/kvm/x86.c | 4 ++++ include/linux/thread_info.h | 1 + 3 files changed, 7 insertions(+) diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index f9453536f9bb..7b697005cc51 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -56,6 +56,8 @@ struct task_struct; struct thread_info { unsigned long flags; /* low level flags */ u32 status; /* thread synchronous flags */ + bool vcpu_loaded; + }; #define INIT_THREAD_INFO(tsk) \ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a8ad3a4d86b1..3d9c049e749e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3303,6 +3303,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) } kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu); + + current_thread_info()->vcpu_loaded = 1; } static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu) @@ -3322,6 +3324,8 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) { int idx; + current_thread_info()->vcpu_loaded = 0; + if (vcpu->preempted) vcpu->arch.preempted_in_kernel = !kvm_x86_ops->get_cpl(vcpu); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 8d8821b3689a..016c2c887354 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -52,6 +52,7 @@ enum { static inline void set_ti_thread_flag(struct thread_info *ti, int flag) { + WARN_ON_ONCE(ti->vcpu_loaded && flag == TIF_NEED_FPU_LOAD); set_bit(flag, (unsigned long *)&ti->flags); } -- 2.24.0