Re: [PATCH 0/30] nVMX: Nested VMX, v9

Avi Kivity <avi@xxxxxxxxxx> · Mon, 23 May 2011 12:52:50 +0300

On 05/22/2011 10:32 PM, Nadav Har'El wrote:
On Thu, May 12, 2011, Gleb Natapov wrote about "Re: [PATCH 0/30] nVMX: Nested VMX, v9":
>  >  But if my interpretation of the code is correct, SVM isn't much closer
>  >  than VMX to the goal of moving this logic to x86.c. When some logic is
>  >  moved there, both SVM and VMX code will need to change - perhaps even
>  >  considerably. So how will it be helpful to make VMX behave exactly like
>  >  SVM does now, when the latter will also need to change considerably?
>  >
>  SVM design is much close to the goal of moving the logic into x86.c
>  because IIRC it does not bypass parsing of IDT vectoring info into arch
>  independent structure. VMX code uses vmx->idt_vectoring_info directly.

At the risk of sounding blasphemous, I'd like to make the case that perhaps
the current nested-VMX design - regarding the IDT-vectoring-info-field
handling - is actually closer than nested-SVM to the goal of moving clean
nested-supporting logic into x86.c, instead of having ad-hoc, unnatural,
workarounds.

Let me explain, and see if you agree with my logic:

We discover at exit time whether the virtualization hardware (VMX or SVM)
exited while *delivering* an interrupt or exception to the current guest.
This is known as "idt-vectoring-information" in VMX.

What do we need to do with this idt-vectoring-information? In regular (non-
nested) guests, the answer is simple: On the next entry, we need to inject
this event again into the guest, so it can resume the delivery of the
same event it was trying to deliver. This is why the nested-unaware code
has a vmx_complete_interrupts which basically adds this idt-vectoring-info
into KVM's event queue, which on the next entry will be injected similarly
to the way virtual interrupts from userspace are injected, and so on.

The other thing we may need to do, is to expose it to userspace in case 
we're live migrating at exactly this point in time.

But with nested virtualization, this is *not* what is supposed to happen -
we do not *always* need to inject the event to the guest. We will only need
to inject the event if the next entry will be again to the same guest, i.e.,
L1 after L1, or L2 after L2. If the idt-vectoring-info came from L2, but
our next entry will be into L1 (i.e., a nested exit), we *shouldn't* inject
the event as usual, but should rather pass this idt-vectoring-info field
as the exit information that L1 gets (in nested vmx terminology, in vmcs12).

However, at the time of exit, we cannot know for sure whether L2 will actually
run next, because it is still possible that an injection from user space,
before the next entry, will cause us to decide to exit to L1.

Therefore, I believe that the clean solution isn't to leave the original
non-nested logic that always queues the idt-vectoring-info assuming it will
be injected, and then if it shouldn't (because we want to exit during entry)
we need to skip the entry once as a "trick" to avoid this wrong injection.

Rather, a clean solution is, I think, to recognize that in nested
virtualization, idt-vectoring-info is a different kind of beast than regular
injected events, and it needs to be saved at exit time in a different field
(which will of course be common to SVM and VMX). Only at entry time, after
the regular injection code (which may cause a nested exit), we can call a
x86_op to handle this special injection.

The benefit of this approach, which is closer to the current vmx code,
is, I think, that x86.c will contain clear, self-explanatory nested logic,
instead of relying on vmx.c or svm.c circumventing various x86.c functions
and mechanisms to do something different from what they were meant to do.

IMO this will cause confusion, especially with the user interfaces used 
to read/write pending events.

I think what we need to do is:

1. change ->interrupt_allowed() to return true if the interrupt flag is 
unmasked OR if in a nested guest, and we're intercepting interrupts
2. change ->set_irq() to cause a nested vmexit if in a nested guest and 
we're intercepting interrupts
3. change ->nmi_allowed() and ->set_nmi() in a similar way
4. add a .injected flag to the interrupt queue which overrides the 
nested vmexit for VM_ENTRY_INTR_INFO_FIELD and the svm equivalent; if 
present normal injection takes place (or an error vmexit if the 
interrupt flag is clear and we cannot inject)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html