Re: [PATCH] KVM: nVMX: Fix direct injection of interrupts from L0 to L2

Jan Kiszka <jan.kiszka@xxxxxx> · Tue, 19 Feb 2013 11:04:01 +0100

On 2013-02-17 18:35, Gleb Natapov wrote:
> On Sun, Feb 17, 2013 at 06:01:05PM +0100, Jan Kiszka wrote:
>> On 2013-02-17 17:26, Gleb Natapov wrote:
>>> On Sun, Feb 17, 2013 at 04:31:26PM +0100, Jan Kiszka wrote:
>>>> On 2013-02-17 16:07, Gleb Natapov wrote:
>>>>> On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote:
>>>>>> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
>>>>>>
>>>>>> If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped
>>>>>> vmx_complete_interrupts on L2 exits. This is required because, with
>>>>>> direct interrupt injection from L0 to L2, L0 has to update its pending
>>>>>> events.
>>>>>>
>>>>>> Also, we need to allow vmx_cancel_injection when entering L2 in we left
>>>>>> to L0. This condition is indirectly derived from the absence of valid
>>>>>> vectoring info in vmcs12. We no explicitly clear it if we find out that
>>>>>> the L2 exit is not targeting L1 but L0.
>>>>>>
>>>>> We really need to overhaul how interrupt injection is emulated in nested
>>>>> VMX. Why not put pending events into event queue instead of
>>>>> get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual way.
>>>>
>>>> I was thinking about the same step but felt unsure so far if
>>>> vmx_complete_interrupts & Co. do not include any assumptions about the
>>>> vmcs configuration that won't match what L1 does. So I went for a
>>>> different path first, specifically to avoid impact on these hairy bits
>>>> for non-nested mode.
>>>>
>>> Assumption made by those functions should be still correct since guest
>>> VMCS configuration is not applied directly to real HW, but we should be
>>> careful of course. For instance interrupt queues should be cleared
>>> during nested vmexit and event transfered back to idt_vectoring_info_field.
>>> IIRC this is how nested SVM works BTW.
>>
>> Checking __vmx_complete_interrupts, the first issue I find is that type
>> 5 (privileged software exception) is not decoded, thus will be lost if
>> L2 leaves this way. That's a reason why it might be better to re-inject
>> the content of vmcs12 if it is valid. VMX is a bit more hairy than SVM,
>> I guess.
>>
> I do not see type 5 in SDM Table 24-15. We handle every type specified
> there. Why shouldn't we? SVM and VMX are pretty close in regards to
> event injection, this allowed us to move a lot of logic into the common
> code.

I had a look at SVM to check how it deals with this, but I'm not sure
if I understand the logic correctly. SVM does:

static int nested_svm_vmexit(struct vcpu_svm *svm)
{
	...
	/*
	 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
	 * to make sure that we do not lose injected events. So check event_inj
	 * here and copy it to exit_int_info if it is valid.
	 * Exit_int_info and event_inj can't be both valid because the case
	 * below only happens on a VMRUN instruction intercept which has
	 * no valid exit_int_info set.
	 */
	if (vmcb->control.event_inj & SVM_EVTINJ_VALID) {
		struct vmcb_control_area *nc = &nested_vmcb->control;

		nc->exit_int_info     = vmcb->control.event_inj;
		nc->exit_int_info_err = vmcb->control.event_inj_err;
	}

nested_svm_vmexit is only called when we leave L2 toward L1, right? So,
vmcb->control.event_inj might have been set on last VMRUN emulation, and
if that one failed, this value shall become the nested exit_int_info. So
far, so good.

But what if that injection succeeded and we are now exiting L2 past the
execution of VMRUN, e.g. L1 intercepts the execution of some special
instruction in L2? Doesn't the nested exit_int_info now gain a stale
value? Or does the hardware clear the valid bit int EVENTINJ on
successful injection? Didn't find an indication in the spec on first
glance.

Otherwise the logic seems to be like this:
 - EVENTINJ is set to the nested value on VMRUN emulation, and only
   there (that's in contrast to current VMX, but it makes sense)
 - Interrupt completion with state transfer the VCPU event queues is
   *only* performed on L2-to-L1 exits (that's like VMX is trying to do
   it as well)
 - There is a special case around nested.exit_required that I didn't
   fully get yet, nor can I say how it corresponds to logic in VMX.

Jan

Attachment:
signature.asc

Description: OpenPGP digital signature