On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote: > From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > > If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped > vmx_complete_interrupts on L2 exits. This is required because, with > direct interrupt injection from L0 to L2, L0 has to update its pending > events. > > Also, we need to allow vmx_cancel_injection when entering L2 in we left > to L0. This condition is indirectly derived from the absence of valid > vectoring info in vmcs12. We no explicitly clear it if we find out that > the L2 exit is not targeting L1 but L0. > We really need to overhaul how interrupt injection is emulated in nested VMX. Why not put pending events into event queue instead of get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual way. > Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > --- > arch/x86/kvm/vmx.c | 43 +++++++++++++++++++++++++++---------------- > 1 files changed, 27 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 68a045ae..464b6a5 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -624,6 +624,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, > struct kvm_segment *var, int seg); > static bool guest_state_valid(struct kvm_vcpu *vcpu); > static u32 vmx_segment_access_rights(struct kvm_segment *var); > +static void vmx_complete_interrupts(struct vcpu_vmx *vmx); > > static DEFINE_PER_CPU(struct vmcs *, vmxarea); > static DEFINE_PER_CPU(struct vmcs *, current_vmcs); > @@ -6213,9 +6214,19 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) > else > vmx->nested.nested_run_pending = 0; > > - if (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) { > - nested_vmx_vmexit(vcpu); > - return 1; > + if (is_guest_mode(vcpu)) { > + if (nested_vmx_exit_handled(vcpu)) { > + nested_vmx_vmexit(vcpu); > + return 1; > + } > + /* > + * Now it's clear, we are leaving to L0. Perform the postponed > + * interrupt completion and clear L1's vectoring info field so > + * that we do not overwrite what L0 wants to inject on > + * re-entry. > + */ > + vmx_complete_interrupts(vmx); > + get_vmcs12(vcpu)->idt_vectoring_info_field = 0; > } > > if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) { > @@ -6495,8 +6506,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, > > static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > { > - if (is_guest_mode(&vmx->vcpu)) > - return; > __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > VM_EXIT_INSTRUCTION_LEN, > IDT_VECTORING_ERROR_CODE); > @@ -6504,7 +6513,9 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > > static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > { > - if (is_guest_mode(vcpu)) > + if (is_guest_mode(vcpu) && > + get_vmcs12(vcpu)->idt_vectoring_info_field & > + VECTORING_INFO_VALID_MASK) Why skip cancel_injection at all? As far as I see we can lose injected irq if we do. Consider: io thread vcpu in nested mode set irr 200 clear irr 200 set isr 200 set 200 in VM_ENTRY_INTR_INFO_FIELD set irr 250 set KVM_REQ_EVENT if (KVM_REQ_EVENT) vmx_cancel_injection() <- does nothing clear irr 250 set isr 250 set 250 in VM_ENTRY_INTR_INFO_FIELD vmentry So now APIC state is bogus. isr bit 200 is set but vector 200 was never injected and actually is lost forever. Next EOI will clear isr 250 and isr 200 will block all lower level interrupt forever. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html