On Sun, Feb 17, 2013 at 04:31:26PM +0100, Jan Kiszka wrote: > On 2013-02-17 16:07, Gleb Natapov wrote: > > On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote: > >> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >> > >> If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped > >> vmx_complete_interrupts on L2 exits. This is required because, with > >> direct interrupt injection from L0 to L2, L0 has to update its pending > >> events. > >> > >> Also, we need to allow vmx_cancel_injection when entering L2 in we left > >> to L0. This condition is indirectly derived from the absence of valid > >> vectoring info in vmcs12. We no explicitly clear it if we find out that > >> the L2 exit is not targeting L1 but L0. > >> > > We really need to overhaul how interrupt injection is emulated in nested > > VMX. Why not put pending events into event queue instead of > > get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual way. > > I was thinking about the same step but felt unsure so far if > vmx_complete_interrupts & Co. do not include any assumptions about the > vmcs configuration that won't match what L1 does. So I went for a > different path first, specifically to avoid impact on these hairy bits > for non-nested mode. > Assumption made by those functions should be still correct since guest VMCS configuration is not applied directly to real HW, but we should be careful of course. For instance interrupt queues should be cleared during nested vmexit and event transfered back to idt_vectoring_info_field. IIRC this is how nested SVM works BTW. And with you patch you did a half of the job already :) When exiting to L0 you transfer event information from get_vmcs12(vcpu)->idt_vectoring_info_field to our internal event queues anyway. Hmm, but you do not clear the queue during nested vmexit. So what happens if L2 exits to L0 with an exception in idt_vectoring_info_field. Now interrupt is delivered so nested vm exit is done, but exception is left in internal queue. I think it will be delivered into L1 during next vmentry. > > > >> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >> --- > >> arch/x86/kvm/vmx.c | 43 +++++++++++++++++++++++++++---------------- > >> 1 files changed, 27 insertions(+), 16 deletions(-) > >> > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >> index 68a045ae..464b6a5 100644 > >> --- a/arch/x86/kvm/vmx.c > >> +++ b/arch/x86/kvm/vmx.c > >> @@ -624,6 +624,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, > >> struct kvm_segment *var, int seg); > >> static bool guest_state_valid(struct kvm_vcpu *vcpu); > >> static u32 vmx_segment_access_rights(struct kvm_segment *var); > >> +static void vmx_complete_interrupts(struct vcpu_vmx *vmx); > >> > >> static DEFINE_PER_CPU(struct vmcs *, vmxarea); > >> static DEFINE_PER_CPU(struct vmcs *, current_vmcs); > >> @@ -6213,9 +6214,19 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) > >> else > >> vmx->nested.nested_run_pending = 0; > >> > >> - if (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) { > >> - nested_vmx_vmexit(vcpu); > >> - return 1; > >> + if (is_guest_mode(vcpu)) { > >> + if (nested_vmx_exit_handled(vcpu)) { > >> + nested_vmx_vmexit(vcpu); > >> + return 1; > >> + } > >> + /* > >> + * Now it's clear, we are leaving to L0. Perform the postponed > >> + * interrupt completion and clear L1's vectoring info field so > >> + * that we do not overwrite what L0 wants to inject on > >> + * re-entry. > >> + */ > >> + vmx_complete_interrupts(vmx); > >> + get_vmcs12(vcpu)->idt_vectoring_info_field = 0; > >> } > >> > >> if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) { > >> @@ -6495,8 +6506,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, > >> > >> static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > >> { > >> - if (is_guest_mode(&vmx->vcpu)) > >> - return; > >> __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > >> VM_EXIT_INSTRUCTION_LEN, > >> IDT_VECTORING_ERROR_CODE); > >> @@ -6504,7 +6513,9 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > >> > >> static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > >> { > >> - if (is_guest_mode(vcpu)) > >> + if (is_guest_mode(vcpu) && > >> + get_vmcs12(vcpu)->idt_vectoring_info_field & > >> + VECTORING_INFO_VALID_MASK) > > Why skip cancel_injection at all? As far as I see we can lose injected > > irq if we do. Consider: > > > > io thread vcpu in nested mode > > set irr 200 > > clear irr 200 set isr 200 > > set 200 in VM_ENTRY_INTR_INFO_FIELD > > set irr 250 > > set KVM_REQ_EVENT > > if (KVM_REQ_EVENT) > > vmx_cancel_injection() <- does nothing > > No, it does cancel as vmcs12's idt_vectoring_info_field signals an > invalid state then. Only if we left L2 with valid vectoring info and are > about to reenter, we skip the cancellation - but in that case we didn't > inject anything from L0 previously anyway. > Ah, I misread if() condition. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html