Re: [PATCH] KVM: nVMX: Fix direct injection of interrupts from L0 to L2

Gleb Natapov <gleb@xxxxxxxxxx> · Tue, 19 Feb 2013 15:13:33 +0200

Copying Alex. He wrote nested SVM.

On Tue, Feb 19, 2013 at 11:04:01AM +0100, Jan Kiszka wrote:
> On 2013-02-17 18:35, Gleb Natapov wrote:
> > On Sun, Feb 17, 2013 at 06:01:05PM +0100, Jan Kiszka wrote:
> >> On 2013-02-17 17:26, Gleb Natapov wrote:
> >>> On Sun, Feb 17, 2013 at 04:31:26PM +0100, Jan Kiszka wrote:
> >>>> On 2013-02-17 16:07, Gleb Natapov wrote:
> >>>>> On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote:
> >>>>>> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
> >>>>>>
> >>>>>> If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped
> >>>>>> vmx_complete_interrupts on L2 exits. This is required because, with
> >>>>>> direct interrupt injection from L0 to L2, L0 has to update its pending
> >>>>>> events.
> >>>>>>
> >>>>>> Also, we need to allow vmx_cancel_injection when entering L2 in we left
> >>>>>> to L0. This condition is indirectly derived from the absence of valid
> >>>>>> vectoring info in vmcs12. We no explicitly clear it if we find out that
> >>>>>> the L2 exit is not targeting L1 but L0.
> >>>>>>
> >>>>> We really need to overhaul how interrupt injection is emulated in nested
> >>>>> VMX. Why not put pending events into event queue instead of
> >>>>> get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual way.
> >>>>
> >>>> I was thinking about the same step but felt unsure so far if
> >>>> vmx_complete_interrupts & Co. do not include any assumptions about the
> >>>> vmcs configuration that won't match what L1 does. So I went for a
> >>>> different path first, specifically to avoid impact on these hairy bits
> >>>> for non-nested mode.
> >>>>
> >>> Assumption made by those functions should be still correct since guest
> >>> VMCS configuration is not applied directly to real HW, but we should be
> >>> careful of course. For instance interrupt queues should be cleared
> >>> during nested vmexit and event transfered back to idt_vectoring_info_field.
> >>> IIRC this is how nested SVM works BTW.
> >>
> >> Checking __vmx_complete_interrupts, the first issue I find is that type
> >> 5 (privileged software exception) is not decoded, thus will be lost if
> >> L2 leaves this way. That's a reason why it might be better to re-inject
> >> the content of vmcs12 if it is valid. VMX is a bit more hairy than SVM,
> >> I guess.
> >>
> > I do not see type 5 in SDM Table 24-15. We handle every type specified
> > there. Why shouldn't we? SVM and VMX are pretty close in regards to
> > event injection, this allowed us to move a lot of logic into the common
> > code.
> 
> I had a look at SVM to check how it deals with this, but I'm not sure
> if I understand the logic correctly. SVM does:
> 
> static int nested_svm_vmexit(struct vcpu_svm *svm)
> {
> 	...
> 	/*
> 	 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
> 	 * to make sure that we do not lose injected events. So check event_inj
> 	 * here and copy it to exit_int_info if it is valid.
> 	 * Exit_int_info and event_inj can't be both valid because the case
> 	 * below only happens on a VMRUN instruction intercept which has
> 	 * no valid exit_int_info set.
> 	 */
> 	if (vmcb->control.event_inj & SVM_EVTINJ_VALID) {
> 		struct vmcb_control_area *nc = &nested_vmcb->control;
> 
> 		nc->exit_int_info     = vmcb->control.event_inj;
> 		nc->exit_int_info_err = vmcb->control.event_inj_err;
> 	}
> 
> nested_svm_vmexit is only called when we leave L2 toward L1, right? So,
> vmcb->control.event_inj might have been set on last VMRUN emulation, and
> if that one failed, this value shall become the nested exit_int_info. So
> far, so good.
> 
> But what if that injection succeeded and we are now exiting L2 past the
> execution of VMRUN, e.g. L1 intercepts the execution of some special
> instruction in L2? Doesn't the nested exit_int_info now gain a stale
> value? Or does the hardware clear the valid bit int EVENTINJ on
> successful injection? Didn't find an indication in the spec on first
> glance.
I think it should. Otherwise, even without nested guest, event will be
reinject on the next entry.

> 
> Otherwise the logic seems to be like this:
>  - EVENTINJ is set to the nested value on VMRUN emulation, and only
>    there (that's in contrast to current VMX, but it makes sense)
>  - Interrupt completion with state transfer the VCPU event queues is
>    *only* performed on L2-to-L1 exits (that's like VMX is trying to do
>    it as well)
>  - There is a special case around nested.exit_required that I didn't
>    fully get yet, nor can I say how it corresponds to logic in VMX.
> 
> Jan
> 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html