On Thu, 2018-08-30 at 03:24 +0300, Liran Alon wrote: > > > > > On 29 Aug 2018, at 19:34, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > I agree. When I said I didn't think it was the correct fix, I was > > thinking that we should propagate the pending interrupt from vmcs01 > > to vmcs02, but I realized that was wrong after analyzing everything > > for the thousandth time. > > > > So, I agree that the general direction is correct, though I think we > > can narrow down when we force events to be re-evaluated and also be > > more explicit in the reasoning. And this should also override the > > HALT_STATE handling, e.g. the injecting to L1 will wake the CPU from > > its halt state. I think the HALT_STATE case was why I saw L2 failing > > the preemption unit test. > > > > Hopefully I didn't mangle the patch below... > > > > From 69617481cb5d8813046d32d2b6881e97b88a746e Mon Sep 17 00:00:00 2001 > > > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > Date: Wed, 29 Aug 2018 11:06:14 -0700 > > Subject: [PATCH] KVM: nVMX: re-evaluate events if L1 should get an INTR/NMI after VMEnter > > > > Force re-evaluation of events prior to running vmcs02 if vmcs01 has > > a pending INTR or NMI and vmcs12 is configured to exit on the event, > > in which case the event will cause a VMExit to L1 immediately after > > VMEnter regardless of L2's event blocking. Re-evaluating events is > > needed to ensure L0 triggers an immediate VMExit from L2 in order to > > inject the INTR or NMI into L1. > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > --- > > arch/x86/kvm/vmx.c | 27 ++++++++++++++++++++++----- > > 1 file changed, 22 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index 8dae47e7267a..a5395fc39cb2 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -12602,7 +12602,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) > > struct vmcs12 *vmcs12; > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > u32 interrupt_shadow = vmx_get_interrupt_shadow(vcpu); > > - u32 exit_qual; > > + u32 exit_qual, vmcs01_cpu_exec_ctrl; > > int ret; > > > > if (!nested_vmx_check_permission(vcpu)) > > @@ -12674,8 +12674,11 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) > > > > /* > > * We're finally done with prerequisite checking, and can start with > > - * the nested entry. > > + * the nested entry. Snapshot the CPU-based execution controls from > > + * vmcs01 before loading vmcs02, we'll need them to properly handle > > + * post-VMEnter INTR/NMI injection to L1. > > */ > > + vmcs01_cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); > > > > vmx->nested.nested_run_pending = 1; > > ret = enter_vmx_non_root_mode(vcpu, &exit_qual); > > @@ -12701,11 +12704,25 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) > > nested_cache_shadow_vmcs12(vcpu, vmcs12); > > > > /* > > - * If we're entering a halted L2 vcpu and the L2 vcpu won't be woken > > - * by event injection, halt vcpu. > > + * Force re-evaluation of events prior to running vmcs02 if vmcs01 has > > + * a pending INTR or NMI and vmcs12 is configured to exit on the event, > > + * in which case the event will cause a VMExit to L1 immediately after > > + * VMEnter regardless of L2's event blocking. Re-evaluating events is > > + * needed to ensure L0 triggers an immediate VMExit from L2 in order to > > + * inject the INTR or NMI into L1. > > */ > > - if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) && > > + if (((vmcs01_cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING) && > > + (vmcs12->pin_based_vm_exec_control & PIN_BASED_EXT_INTR_MASK)) || > > + ((vmcs01_cpu_exec_ctrl & CPU_BASED_VIRTUAL_NMI_PENDING) && > > + (vmcs12->pin_based_vm_exec_control & PIN_BASED_NMI_EXITING))) { > > + kvm_make_request(KVM_REQ_EVENT, vcpu); > > + } else if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) && > > !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) { > > + /* > > + * Halt the vCPU if we're entering a halted L2 vCPU and the L2 > > + * vCPU won't be woken by an injected event, e.g. VOE to L2 or > > + * INTR/NMI to L1. > > + */ > > vmx->nested.nested_run_pending = 0; > > return kvm_vcpu_halt(vcpu); > > } > > > (Sorry for opening a new email thread. Something screwed up in my email client…) > > I see a couple of issues with the patch you suggest: > > 1. When L1 don’t intercept external-interrupts, interrupt should still be injected after vmlaunch/vmresume but directly to L2. > Therefore, we should remove the tests against vmcs12->pin_based_vm_exec_control bits. > (KVM_REQ_EVENT will handle this direct injection to L2 correctly) Right, not sure why I brought vmcs12 into the picture. The behavior we're enforcing is the VMExit to L0, everything after that is handled by whatever normally L0 does after the associated VMExit. > 2. The logic of deciding if we should set KVM_REQ_EVENT should be inside enter_vmx_non_root_mode() and not nested_vmx_run(). > This is because otherwise, you could reach a buggy scenario after nVMX migration which sets nVMX state directly from vmx_set_nested_state(). In that case I think we'd want to nested_run_pending, i.e. entry from SMM shouldn't set KVM_REQ_EVENT. If nested_run_pending==false and one of the pending bits is sets (in the migration case) we already goofed. > As I agree with you regarding your other comments, I will create a v2 to my patch and submit it. :) > (Together with the relevant kvm-unit-test) > > Thanks, > -Liran > >