Re: [PATCH v2] KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Thu, 30 Aug 2018 15:57:36 +0200

2018-08-30 15:39+0300, Liran Alon:
> 
> 
> > On 30 Aug 2018, at 14:50, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
> > 
> > 2018-08-30 12:57+0300, Liran Alon:
> >> Consider the case L1 had a IRQ/NMI event until it executed
> >> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
> >> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
> >> L0 needs to evaluate if this pending event should cause an exit from
> >> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
> >> EXTERNAL_INTERRUPT).
> >> 
> >> Usually this would be handled by L0 requesting a IRQ/NMI window
> >> by setting VMCS accordingly. However, this setting was done on
> >> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
> >> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
> >> requesting a KVM_REQ_EVENT.
> >> 
> >> Note that above scenario exists when L1 KVM is about to enter L2 but
> >> requests an "immediate-exit". As in this case, L1 will
> >> disable-interrupts and then send a self-IPI before entering L2.
> >> 
> >> Co-authored-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> >> Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx>
> >> ---
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> @@ -12574,6 +12577,25 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
> >> 		kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
> >> 	}
> >> 
> >> +	/*
> >> +	 * If L1 had a pending IRQ/NMI until it executed
> >> +	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
> >> +	 * disallowed (e.g. interrupts disabled), L0 needs to
> >> +	 * evaluate if this pending event should cause an exit from L2
> >> +	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
> >> +	 * intercept EXTERNAL_INTERRUPT).
> >> +	 *
> >> +	 * Usually this would be handled by L0 requesting a
> >> +	 * IRQ/NMI window by setting VMCS accordingly. However,
> >> +	 * this setting was done on VMCS01 and now VMCS02 is active
> >> +	 * instead. Thus, we force L0 to perform pending event
> >> +	 * evaluation by requesting a KVM_REQ_EVENT.
> >> +	 */
> >> +	if (vmcs01_cpu_exec_ctrl &
> >> +		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) {
> > 
> > Looks good, pending nested interrupts will be handled on the actual VM
> > entry, so we can ignore them here.
> 
> Can you clarify?

An interrupt could have been pending in PIR (received after the last VM
entry) and we only look at posted interrupts in after disabling
interrupts before VMRESUME, which means that L1 might have a pending IRQ
and KVM is not aware of it at this point.

sync_pir_to_irr() will notice the interrupt later and probably do the
right thing.

(It was never a problem for the immediate-exit use case as that already
 had the interrupt synced, but missing posted interrupts could have
 delayed them after the VM entry and the slightly different case of
 posted interrupts was not mentioned in the comment.)

> >> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> >> +	}
> >> +
> >> 	/*
> >> 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
> >> 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
> >> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> >> 	 * by event injection, halt vcpu.
> >> 	 */
> >> 	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
> >> -	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
> >> +	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
> >> +	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
> > 
> > What is the purpose of this check?  I think the event is recognized in
> > when checking for runnability and will resume the VCPU,
> 
> That’s actually not accurate.
> 
> Without this change, kvm_vcpu_halt() will set mp_state = KVM_MP_STATE_HALTED.
> Later, vcpu_run() will call kvm_vcpu_running() which will return false.
> (Note that vmx_check_nested_events() won’t exit to L1 because nested_run_pending is set).
> Therefore, vcpu_block() will be called which will call
> kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_has_events()
> which does check for pending interrupts using kvm_cpu_has_interrupt().
> However, because nested_run_pending is set, vmx_interrupt_allowed() returns false and
> therefore kvm_vcpu_has_events() doesn’t return “true” for the pending interrupt…

We clear nested_run_pending just before kvm_vcpu_halt(), which is why I
think the interrupt should be noticed.

> Therefore, I believe this change is required.
> 
> In addition, I have a unit-test for this scenario as-well and it fails without this change and passes with it.

A bug elsewhere would not surprise me. :)

> Will submit it soon (It’s now on internal review).
> 
> -Liran
> 
> > 
> > thanks.
> > 
> >> 		vmx->nested.nested_run_pending = 0;
> >> 		return kvm_vcpu_halt(vcpu);
> >> 	}
> >> -- 
> >> 2.16.1
> >> 
>