> On 30 Aug 2018, at 16:57, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > > 2018-08-30 15:39+0300, Liran Alon: >> >> >>> On 30 Aug 2018, at 14:50, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: >>> >>> 2018-08-30 12:57+0300, Liran Alon: >>>> Consider the case L1 had a IRQ/NMI event until it executed >>>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed >>>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME, >>>> L0 needs to evaluate if this pending event should cause an exit from >>>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept >>>> EXTERNAL_INTERRUPT). >>>> >>>> Usually this would be handled by L0 requesting a IRQ/NMI window >>>> by setting VMCS accordingly. However, this setting was done on >>>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes >>>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by >>>> requesting a KVM_REQ_EVENT. >>>> >>>> Note that above scenario exists when L1 KVM is about to enter L2 but >>>> requests an "immediate-exit". As in this case, L1 will >>>> disable-interrupts and then send a self-IPI before entering L2. >>>> >>>> Co-authored-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> >>>> Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx> >>>> --- >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>> @@ -12574,6 +12577,25 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual) >>>> kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu); >>>> } >>>> >>>> + /* >>>> + * If L1 had a pending IRQ/NMI until it executed >>>> + * VMLAUNCH/VMRESUME which wasn't delivered because it was >>>> + * disallowed (e.g. interrupts disabled), L0 needs to >>>> + * evaluate if this pending event should cause an exit from L2 >>>> + * to L1 or delivered directly to L2 (e.g. In case L1 don't >>>> + * intercept EXTERNAL_INTERRUPT). >>>> + * >>>> + * Usually this would be handled by L0 requesting a >>>> + * IRQ/NMI window by setting VMCS accordingly. However, >>>> + * this setting was done on VMCS01 and now VMCS02 is active >>>> + * instead. Thus, we force L0 to perform pending event >>>> + * evaluation by requesting a KVM_REQ_EVENT. >>>> + */ >>>> + if (vmcs01_cpu_exec_ctrl & >>>> + (CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) { >>> >>> Looks good, pending nested interrupts will be handled on the actual VM >>> entry, so we can ignore them here. >> >> Can you clarify? > > An interrupt could have been pending in PIR (received after the last VM > entry) and we only look at posted interrupts in after disabling > interrupts before VMRESUME, which means that L1 might have a pending IRQ > and KVM is not aware of it at this point. > > sync_pir_to_irr() will notice the interrupt later and probably do the > right thing. > > (It was never a problem for the immediate-exit use case as that already > had the interrupt synced, but missing posted interrupts could have > delayed them after the VM entry and the slightly different case of > posted interrupts was not mentioned in the comment.) > >>>> + kvm_make_request(KVM_REQ_EVENT, vcpu); >>>> + } >>>> + >>>> /* >>>> * Note no nested_vmx_succeed or nested_vmx_fail here. At this point >>>> * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet >>>> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) >>>> * by event injection, halt vcpu. >>>> */ >>>> if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) && >>>> - !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) { >>>> + !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) && >>>> + !kvm_test_request(KVM_REQ_EVENT, vcpu)) { >>> >>> What is the purpose of this check? I think the event is recognized in >>> when checking for runnability and will resume the VCPU, >> >> That’s actually not accurate. >> >> Without this change, kvm_vcpu_halt() will set mp_state = KVM_MP_STATE_HALTED. >> Later, vcpu_run() will call kvm_vcpu_running() which will return false. >> (Note that vmx_check_nested_events() won’t exit to L1 because nested_run_pending is set). >> Therefore, vcpu_block() will be called which will call >> kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_has_events() >> which does check for pending interrupts using kvm_cpu_has_interrupt(). >> However, because nested_run_pending is set, vmx_interrupt_allowed() returns false and >> therefore kvm_vcpu_has_events() doesn’t return “true” for the pending interrupt… > > We clear nested_run_pending just before kvm_vcpu_halt(), which is why I > think the interrupt should be noticed. Oops. Yes you are right. > >> Therefore, I believe this change is required. >> >> In addition, I have a unit-test for this scenario as-well and it fails without this change and passes with it. > > A bug elsewhere would not surprise me. :) I thought so as-well but actually after some digging, I found that this was an issue in my unit-test... We should apply the patch I submitted here without this "!kvm_test_request(KVM_REQ_EVENT, vcpu)”. Now it passes all my unit-tests. Do you want me to submit a new patch or will you remove this while applying? Thanks, -Liran > >> Will submit it soon (It’s now on internal review). >> >> -Liran >> >>> >>> thanks. >>> >>>> vmx->nested.nested_run_pending = 0; >>>> return kvm_vcpu_halt(vcpu); >>>> } >>>> -- >>>> 2.16.1 >>>> >>