kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore it cannot sleep. Writing to guest memory is therefore forbidden, but it can happen on AMD processors if kvm_check_nested_events() causes a vmexit. Fortunately, all events that are caught by kvm_check_nested_events() are also recognized by kvm_vcpu_has_events() through vendor callbacks such as kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so remove the call and postpone the actual processing to vcpu_block(). Opportunistically honor the return of kvm_check_nested_events(). KVM punted on the check in kvm_vcpu_running() because the only error path is if vmx_complete_nested_posted_interrupt() fails, in which case KVM exits to userspace with "internal error" i.e. the VM is likely dead anyways so it wasn't worth overloading the return of kvm_vcpu_running(). Add the check mostly so that KVM is consistent with itself; the return of the call via kvm_apic_accept_events()=>kvm_check_nested_events() that immediately follows _is_ checked. Reported-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> [sean: check and handle return of kvm_check_nested_events()] Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/x86.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dcc675d4e44b..8aeacbc2bff9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10815,6 +10815,17 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu) return 1; } + /* + * Evaluate nested events before exiting the halted state. This allows + * the halt state to be recorded properly in the VMCS12's activity + * state field (AMD does not have a similar field and a VM-Exit always + * causes a spurious wakeup from HLT). + */ + if (is_guest_mode(vcpu)) { + if (kvm_check_nested_events(vcpu) < 0) + return 0; + } + if (kvm_apic_accept_events(vcpu) < 0) return 0; switch(vcpu->arch.mp_state) { @@ -10837,9 +10848,6 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu) static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) - kvm_check_nested_events(vcpu); - return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted); } This commit breaks delivery of a (virtualized) posted interrupt from an L1 vCPU to a halted L2 vCPU. Looking back at commit e6c67d8cf117 ("KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv"), Liran wrote: Note that this also handles the case of nested posted-interrupt by the fact RVI is updated in vmx_complete_nested_posted_interrupt() which is called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_running() -> vmx_check_nested_events() -> vmx_complete_nested_posted_interrupt(). Clearly, that is no longer the case.