2017-12-07 13:19+0200, Liran Alon: > On 06/12/17 22:20, Radim Krčmář wrote: > > 2017-12-05 10:16+0200, Liran Alon: > > > Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr() > > > which calls vmx_hwapic_irr_update() to update RVI. > > > Currently, vmx_hwapic_irr_update() contains a tweak in case it is called > > > when CPU is running L2 and L1 don't intercept external-interrupts. > > > In that case, code injects interrupt directly into L2 instead of > > > updating RVI. > > > > > > Besides being hacky (wouldn't expect function updating RVI to also > > > inject interrupt), it also doesn't handle this case correctly. > > > The code contains several issues: > > > 1. When code calls kvm_queue_interrupt() it just passes it max_irr which > > > represents the highest IRR currently pending in L1 LAPIC. > > > This is problematic as interrupt was injected to guest but it's bit is > > > still set in LAPIC IRR instead of being cleared from IRR and set in ISR. > > > 2. Code doesn't check if LAPIC PPR is set to accept an interrupt of > > > max_irr priority. It just checks if interrupts are enabled in guest with > > > vmx_interrupt_allowed(). > > > > Good catch. > > > > > To fix the above issues: > > > 1. Simplify vmx_hwapic_irr_update() to just update RVI. > > > Note that this shouldn't happen when CPU is running L2 > > > (See comment in code). > > > 2. Since now vmx_hwapic_irr_update() only does logic for L1 > > > virtual-interrupt-delivery, inject_pending_event() should be the > > > one responsible for injecting the interrupt directly into L2. > > > Therefore, change kvm_cpu_has_injectable_intr() to check L1 > > > LAPIC when CPU is running L2. > > > > > > This is enough to fix the issue since commit ("KVM: nVMX: Re-evaluate > > > L1 pending events when running L2 and L1 got posted-interrupt") > > > made vcpu_enter_guest() to set KVM_REQ_EVENT when L1 has a pending > > > injectable interrupt. > > > > Right, that answers my question from the first patch. > > > > > Fixes: 963fee165660 ("KVM: nVMX: Fix virtual interrupt delivery > > > injection") > > > > > > Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx> > > > Reviewed-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx> > > > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> > > > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > > --- > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > > @@ -9051,6 +9039,10 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) > > > * If we are running L2 and L1 has a new pending interrupt > > > * which can be injected, we should re-evaluate > > > * what should be done with this new L1 interrupt. > > > + * If L1 intercepts external-interrupts, we should > > > + * exit from L2 to L1. Otherwise, interrupt should be > > > + * delivered directly to L2. > > > + * These cases are handled correctly by KVM_REQ_EVENT. > > > */ > > > if (is_guest_mode(vcpu) && (max_irr > prev_max_irr)) > > > kvm_make_request(KVM_REQ_EVENT, vcpu); > > > > > > KVM_REQ_EVENT is not needed when we are intercepting the interrupt, > > because we just want to run vmx_check_nested_events(), but it is needed > > when delivering L1's interrupt inside L2. > > OK. If vmcs12 intercept external interrupts, I will just call > kvm_vcpu_exiting_guest_mode(). Otherwise, I will use KVM_REQ_EVENT. > > > > > When vmcs12 doesn't intercept external interrupts, I'm thinking we could > > pass virtual interrupts from vmcs01 and use RVI normally, as well as > > delivering L1 posted interrupts to L2 without a VM exit. > > > > Maybe I am missing something but I am not sure this is possible. > > As the comment I added to vmx_hwapic_irr_update() specifies, > virtual-interrupt-delivery cannot be enabled if exit on external-interrupts > is disabled. This can be seen in the following code in > nested_vmx_check_apicv_controls(): > /* > * If virtual interrupt delivery is enabled, > * we must exit on external interrupts. > */ > if (nested_cpu_has_vid(vmcs12) && > !nested_exit_on_intr(vcpu)) > return -EINVAL; > > Therefore, in case vmcs12 disables exit on > external-interrupts then vmcs12 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY must be > cleared. As prepare_vmcs02() currently takes value of > SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY as-is from vmcs12 to vmcs02, it means > vmcs02 doesn't have virtual-interrupt-delivery in this case. Therefore, in > this case, we should not update vmcs02 RVI. Right, vmcs12 can't configure it, but we already have PIN_BASED_EXT_INTR_MASK in vmcs02, so we'd also enable VIRTUAL_INTR_DELIVERY in vmcs02 if we detect that vmcs12 disables PIN_BASED_EXT_INTR_MASK. And because vmcs12 can't use it, we're free to configure vmcs02 with data from vmcs01, so interrupt delivery would behave as if there was no nesting. (Dislaimer: I haven't verified that it would actually work.) For now, I think that throwing KVM_REQ_EVENT in this case is perfectly acceptable and we can optimize later if we notice that it is being widely used. (Windows with device guard come to mind.) Thanks.