On 2013-02-20 15:14, Nadav Har'El wrote: > Hi, > > By the way, if you haven't seen my description of why the current code > did what it did, take a look at > http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg54478.html > Another description might also come in handy: > http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg54476.html > > On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": >> This aligns VMX more with SVM regarding event injection and recovery for >> nested guests. The changes allow to inject interrupts directly from L0 >> to L2. >> >> One difference to SVM is that we always transfer the pending event >> injection into the architectural state of the VCPU and then drop it from >> there if it turns out that we left L2 to enter L1. > > Last time I checked, if I'm remembering correctly, the nested SVM code did > something a bit different: After the exit from L2 to L1 and unnecessarily > queuing the pending interrupt for injection, it skipped one entry into L1, > and as usual after the entry the interrupt queue is cleared so next time > around, when L1 one is really entered, the wrong injection is not attempted. > >> VMX and SVM are now identical in how they recover event injections from >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >> still contains a valid event and, if yes, transfer the content into L1's >> idt_vectoring_info_field. > >> To avoid that we incorrectly leak an event into the architectural VCPU >> state that L1 wants to inject, we skip cancellation on nested run. > > I didn't understand this last point. - prepare_vmcs02 sets event to be injected into L2 - while trying to enter L2, a cancel condition is met - we call vmx_cancel_interrupts but should now avoid filling L1's event into the arch event queues - it's kept in vmcs12 > >> @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) >> vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); >> vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); >> >> - /* clear vm-entry fields which are to be cleared on exit */ >> - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) >> + /* drop what we picked up for L0 via vmx_complete_interrupts */ >> + vcpu->arch.nmi_injected = false; >> + kvm_clear_exception_queue(vcpu); >> + kvm_clear_interrupt_queue(vcpu); > > It would be nice to move these lines out of prepare_vmcs12(), since they > don't really do anything with vmcs12, and move it into > nested_vmx_vmexit() (which is the one which called prepare_vmcs12()). OK. > > Did you test this both with PIN_BASED_EXT_INTR_MASK (the usual case) and > !PIN_BASED_EXT_INTR_MASK (the case which interests you)? We need to make > sure that in the former case, this doesn't clear the interrupt queue after > we put an interrupt to be injected in it (at first glance it seems fine, > but these code paths are so convoluted, it's hard to be sure). I tested both, but none of my tests was close to cover all potential corner cases. But that unconditional queue clearing surely deserves attention and critical review. > >> + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && >> + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { >> + /* >> + * Preserve the event that was supposed to be injected >> + * by emulating it would have been returned in >> + * IDT_VECTORING_INFO_FIELD. >> + */ >> + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & >> + INTR_INFO_VALID_MASK) { >> + vmcs12->idt_vectoring_info_field = >> + vmcs12->vm_entry_intr_info_field; >> + vmcs12->idt_vectoring_error_code = >> + vmcs12->vm_entry_exception_error_code; >> + vmcs12->vm_exit_instruction_len = >> + vmcs12->vm_entry_instruction_len; >> + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); > > I'm afraid I'm missing what you are trying to do here. Why would > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK ever be > true? After all, the processor clears it after each sucessful exit so > last if() will only succeed on failed entries - but this is NOT the > case if we're in the enclosing if (note that vmcs12->vm_exit_reason = > vmcs_read32(VM_EXIT_REASON)). Maybe I'm missing something? Canceled vmentry as indicated above. Look at vcpu_enter_guest: kvm_mmu_reload may fail, or we need to handle some async event / perform some reschedule. But those points are past prepare_vmcs02. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html