The invocation of guest_enter_irqoff() is way before the actual VMENTER happens. guest_exit_irqoff() happens way after the VMEXIT. While the comment in guest_enter_irqoff() says that KVM does not hold references to RCU protected data, for the whole call chain before VMENTER and after VMEXIT there is no guarantee that no RCU protected data is accessed. There are tracepoints hidden in MSR access functions and there are calls into code which takes locks. The latter might cause lockdep to run into RCU trouble. As the call chains are hard to follow it's unclear whether there might be RCU trouble lurking in some corner cases. The VMENTER path after context tracking contains also exit conditions which abort the VMENTER. The VMEXIT return path reenables interrupts before switching RCU back on which means that the interrupt entry/exit has to switch in on and then off again. If tracepoints on local_irq_enable() and local_irqdisable() are enabled then two more RCU on/off transitions are happening. Move it down into the VMX/SVM code close to the actual entry/exit. This is the first step to bring parts of this code into the .noinstr.text section so it can be verified with objtool. Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Tom Lendacky <thomas.lendacky@xxxxxxx> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> Cc: kvm@xxxxxxxxxxxxxxx --- arch/x86/kvm/svm.c | 18 ++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 10 ++++++++++ arch/x86/kvm/x86.c | 2 -- 3 files changed, 28 insertions(+), 2 deletions(-) --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -5766,6 +5766,15 @@ static void svm_vcpu_run(struct kvm_vcpu */ x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl); + /* + * Tell context tracking that this CPU is about to enter guest + * mode. This has to be after x86_spec_ctrl_set_guest() because + * that can take locks (lockdep needs RCU) and calls into world and + * some more. + */ + guest_enter_irqoff(); + + /* This is wrong vs. lockdep. Will be fixed in the next step */ local_irq_enable(); asm volatile ( @@ -5869,6 +5878,14 @@ static void svm_vcpu_run(struct kvm_vcpu loadsegment(gs, svm->host.gs); #endif #endif + /* + * Tell context tracking that this CPU is back. + * + * This needs to be done before the below as native_read_msr() + * contains a tracepoint and x86_spec_ctrl_restore_host() calls + * into world and some more. + */ + guest_exit_irqoff(); /* * We do not use IBRS in the kernel. If this vCPU has used the @@ -5890,6 +5907,7 @@ static void svm_vcpu_run(struct kvm_vcpu reload_tss(vcpu); + /* This is wrong vs. lockdep. Will be fixed in the next step */ local_irq_disable(); x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl); --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6537,6 +6537,11 @@ static void vmx_vcpu_run(struct kvm_vcpu */ x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0); + /* + * Tell context tracking that this CPU is about to enter guest mode. + */ + guest_enter_irqoff(); + /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); @@ -6552,6 +6557,11 @@ static void vmx_vcpu_run(struct kvm_vcpu vcpu->arch.cr2 = read_cr2(); /* + * Tell context tracking that this CPU is back. + */ + guest_exit_irqoff(); + + /* * We do not use IBRS in the kernel. If this vCPU has used the * SPEC_CTRL MSR it may have left it on; save the value and * turn it off. This is much more efficient than blindly adding --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8350,7 +8350,6 @@ static int vcpu_enter_guest(struct kvm_v } trace_kvm_entry(vcpu->vcpu_id); - guest_enter_irqoff(); fpregs_assert_state_consistent(); if (test_thread_flag(TIF_NEED_FPU_LOAD)) @@ -8413,7 +8412,6 @@ static int vcpu_enter_guest(struct kvm_v local_irq_disable(); kvm_after_interrupt(vcpu); - guest_exit_irqoff(); if (lapic_in_kernel(vcpu)) { s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta; if (delta != S64_MIN) {