On 2013-02-19 14:13, Gleb Natapov wrote: > Copying Alex. He wrote nested SVM. > > On Tue, Feb 19, 2013 at 11:04:01AM +0100, Jan Kiszka wrote: >> On 2013-02-17 18:35, Gleb Natapov wrote: >>> On Sun, Feb 17, 2013 at 06:01:05PM +0100, Jan Kiszka wrote: >>>> On 2013-02-17 17:26, Gleb Natapov wrote: >>>>> On Sun, Feb 17, 2013 at 04:31:26PM +0100, Jan Kiszka wrote: >>>>>> On 2013-02-17 16:07, Gleb Natapov wrote: >>>>>>> On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote: >>>>>>>> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> >>>>>>>> >>>>>>>> If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped >>>>>>>> vmx_complete_interrupts on L2 exits. This is required because, with >>>>>>>> direct interrupt injection from L0 to L2, L0 has to update its pending >>>>>>>> events. >>>>>>>> >>>>>>>> Also, we need to allow vmx_cancel_injection when entering L2 in we left >>>>>>>> to L0. This condition is indirectly derived from the absence of valid >>>>>>>> vectoring info in vmcs12. We no explicitly clear it if we find out that >>>>>>>> the L2 exit is not targeting L1 but L0. >>>>>>>> >>>>>>> We really need to overhaul how interrupt injection is emulated in nested >>>>>>> VMX. Why not put pending events into event queue instead of >>>>>>> get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual way. >>>>>> >>>>>> I was thinking about the same step but felt unsure so far if >>>>>> vmx_complete_interrupts & Co. do not include any assumptions about the >>>>>> vmcs configuration that won't match what L1 does. So I went for a >>>>>> different path first, specifically to avoid impact on these hairy bits >>>>>> for non-nested mode. >>>>>> >>>>> Assumption made by those functions should be still correct since guest >>>>> VMCS configuration is not applied directly to real HW, but we should be >>>>> careful of course. For instance interrupt queues should be cleared >>>>> during nested vmexit and event transfered back to idt_vectoring_info_field. >>>>> IIRC this is how nested SVM works BTW. >>>> >>>> Checking __vmx_complete_interrupts, the first issue I find is that type >>>> 5 (privileged software exception) is not decoded, thus will be lost if >>>> L2 leaves this way. That's a reason why it might be better to re-inject >>>> the content of vmcs12 if it is valid. VMX is a bit more hairy than SVM, >>>> I guess. >>>> >>> I do not see type 5 in SDM Table 24-15. We handle every type specified >>> there. Why shouldn't we? SVM and VMX are pretty close in regards to >>> event injection, this allowed us to move a lot of logic into the common >>> code. >> >> I had a look at SVM to check how it deals with this, but I'm not sure >> if I understand the logic correctly. SVM does: >> >> static int nested_svm_vmexit(struct vcpu_svm *svm) >> { >> ... >> /* >> * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have >> * to make sure that we do not lose injected events. So check event_inj >> * here and copy it to exit_int_info if it is valid. >> * Exit_int_info and event_inj can't be both valid because the case >> * below only happens on a VMRUN instruction intercept which has >> * no valid exit_int_info set. >> */ >> if (vmcb->control.event_inj & SVM_EVTINJ_VALID) { >> struct vmcb_control_area *nc = &nested_vmcb->control; >> >> nc->exit_int_info = vmcb->control.event_inj; >> nc->exit_int_info_err = vmcb->control.event_inj_err; >> } >> >> nested_svm_vmexit is only called when we leave L2 toward L1, right? So, >> vmcb->control.event_inj might have been set on last VMRUN emulation, and >> if that one failed, this value shall become the nested exit_int_info. So >> far, so good. >> >> But what if that injection succeeded and we are now exiting L2 past the >> execution of VMRUN, e.g. L1 intercepts the execution of some special >> instruction in L2? Doesn't the nested exit_int_info now gain a stale >> value? Or does the hardware clear the valid bit int EVENTINJ on >> successful injection? Didn't find an indication in the spec on first >> glance. > I think it should. Otherwise, even without nested guest, event will be > reinject on the next entry. OK... there is apparently no place where event_inj is cleared (except for cancellation). Makes me wonder now where a difference between event_inj and exit_int_info could come from. From the case where we did no physical VMRUN (nested.exit_required == true)? Jan > >> >> Otherwise the logic seems to be like this: >> - EVENTINJ is set to the nested value on VMRUN emulation, and only >> there (that's in contrast to current VMX, but it makes sense) >> - Interrupt completion with state transfer the VCPU event queues is >> *only* performed on L2-to-L1 exits (that's like VMX is trying to do >> it as well) >> - There is a special case around nested.exit_required that I didn't >> fully get yet, nor can I say how it corresponds to logic in VMX. >> >> Jan >> > > > > -- > Gleb. > -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html