The VMX code does not track hard interrupt state correctly. The state in tracing and lockdep is 'OFF' all the way during guest mode. From the host point of view this is wrong because the VMENTER reenables interrupts like a return to user space and VMENTER disables them again like an entry from user space. Make it do exactly the same thing as enter/exit user mode does. Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> Cc: kvm@xxxxxxxxxxxxxxx --- arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6538,9 +6538,19 @@ static void vmx_vcpu_run(struct kvm_vcpu x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0); /* - * Tell context tracking that this CPU is about to enter guest mode. + * VMENTER enables interrupts (host state), but the kernel state is + * interrupts disabled when this is invoked. Also tell RCU about + * it. This is the same logic as for exit_to_user_mode(). + * + * 1) Trace interrupts on state + * 2) Prepare lockdep with RCU on + * 3) Invoke context tracking if enabled to adjust RCU state + * 4) Tell lockdep that interrupts are enabled */ + __trace_hardirqs_on(); + lockdep_hardirqs_on_prepare(CALLER_ADDR0); guest_enter_irqoff(); + lockdep_hardirqs_on(CALLER_ADDR0); /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) @@ -6557,9 +6567,20 @@ static void vmx_vcpu_run(struct kvm_vcpu vcpu->arch.cr2 = read_cr2(); /* - * Tell context tracking that this CPU is back. + * VMEXIT disables interrupts (host state), but tracing and lockdep + * have them in state 'on'. Same as enter_from_user_mode(). + * + * 1) Tell lockdep that interrupts are disabled + * 2) Invoke context tracking if enabled to reactivate RCU + * 3) Trace interrupts off state + * + * This needs to be done before the below as native_read_msr() + * contains a tracepoint and x86_spec_ctrl_restore_host() calls + * into world and some more. */ + lockdep_hardirqs_off(CALLER_ADDR0); guest_exit_irqoff(); + __trace_hardirqs_off(); /* * We do not use IBRS in the kernel. If this vCPU has used the