Re: [PATCH v2] KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2

Liran Alon <liran.alon@xxxxxxxxxx> · Sun, 2 Sep 2018 18:49:09 +0300

> On 30 Aug 2018, at 16:57, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
> 
> 2018-08-30 15:39+0300, Liran Alon:
>> 
>> 
>>> On 30 Aug 2018, at 14:50, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
>>> 
>>> 2018-08-30 12:57+0300, Liran Alon:
>>>> Consider the case L1 had a IRQ/NMI event until it executed
>>>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
>>>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
>>>> L0 needs to evaluate if this pending event should cause an exit from
>>>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
>>>> EXTERNAL_INTERRUPT).
>>>> 
>>>> Usually this would be handled by L0 requesting a IRQ/NMI window
>>>> by setting VMCS accordingly. However, this setting was done on
>>>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
>>>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
>>>> requesting a KVM_REQ_EVENT.
>>>> 
>>>> Note that above scenario exists when L1 KVM is about to enter L2 but
>>>> requests an "immediate-exit". As in this case, L1 will
>>>> disable-interrupts and then send a self-IPI before entering L2.
>>>> 
>>>> Co-authored-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
>>>> Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx>
>>>> ---
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> @@ -12574,6 +12577,25 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
>>>> 		kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
>>>> 	}
>>>> 
>>>> +	/*
>>>> +	 * If L1 had a pending IRQ/NMI until it executed
>>>> +	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
>>>> +	 * disallowed (e.g. interrupts disabled), L0 needs to
>>>> +	 * evaluate if this pending event should cause an exit from L2
>>>> +	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
>>>> +	 * intercept EXTERNAL_INTERRUPT).
>>>> +	 *
>>>> +	 * Usually this would be handled by L0 requesting a
>>>> +	 * IRQ/NMI window by setting VMCS accordingly. However,
>>>> +	 * this setting was done on VMCS01 and now VMCS02 is active
>>>> +	 * instead. Thus, we force L0 to perform pending event
>>>> +	 * evaluation by requesting a KVM_REQ_EVENT.
>>>> +	 */
>>>> +	if (vmcs01_cpu_exec_ctrl &
>>>> +		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) {
>>> 
>>> Looks good, pending nested interrupts will be handled on the actual VM
>>> entry, so we can ignore them here.
>> 
>> Can you clarify?
> 
> An interrupt could have been pending in PIR (received after the last VM
> entry) and we only look at posted interrupts in after disabling
> interrupts before VMRESUME, which means that L1 might have a pending IRQ
> and KVM is not aware of it at this point.
> 
> sync_pir_to_irr() will notice the interrupt later and probably do the
> right thing.
> 
> (It was never a problem for the immediate-exit use case as that already
> had the interrupt synced, but missing posted interrupts could have
> delayed them after the VM entry and the slightly different case of
> posted interrupts was not mentioned in the comment.)
> 
>>>> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
>>>> +	}
>>>> +
>>>> 	/*
>>>> 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
>>>> 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
>>>> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>>>> 	 * by event injection, halt vcpu.
>>>> 	 */
>>>> 	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
>>>> -	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
>>>> +	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
>>>> +	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
>>> 
>>> What is the purpose of this check?  I think the event is recognized in
>>> when checking for runnability and will resume the VCPU,
>> 
>> That’s actually not accurate.
>> 
>> Without this change, kvm_vcpu_halt() will set mp_state = KVM_MP_STATE_HALTED.
>> Later, vcpu_run() will call kvm_vcpu_running() which will return false.
>> (Note that vmx_check_nested_events() won’t exit to L1 because nested_run_pending is set).
>> Therefore, vcpu_block() will be called which will call
>> kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_has_events()
>> which does check for pending interrupts using kvm_cpu_has_interrupt().
>> However, because nested_run_pending is set, vmx_interrupt_allowed() returns false and
>> therefore kvm_vcpu_has_events() doesn’t return “true” for the pending interrupt…
> 
> We clear nested_run_pending just before kvm_vcpu_halt(), which is why I
> think the interrupt should be noticed.

Oops. Yes you are right.

> 
>> Therefore, I believe this change is required.
>> 
>> In addition, I have a unit-test for this scenario as-well and it fails without this change and passes with it.
> 
> A bug elsewhere would not surprise me. :)

I thought so as-well but actually after some digging, I found that this was an issue in my unit-test...
We should apply the patch I submitted here without this "!kvm_test_request(KVM_REQ_EVENT, vcpu)”.
Now it passes all my unit-tests.
Do you want me to submit a new patch or will you remove this while applying?

Thanks,
-Liran

> 
>> Will submit it soon (It’s now on internal review).
>> 
>> -Liran
>> 
>>> 
>>> thanks.
>>> 
>>>> 		vmx->nested.nested_run_pending = 0;
>>>> 		return kvm_vcpu_halt(vcpu);
>>>> 	}
>>>> -- 
>>>> 2.16.1
>>>> 
>>