Re: Nested VMX support - kernel v1

Muli Ben-Yehuda <muli@xxxxxxxxxx> · Thu, 3 Sep 2009 12:53:15 +0300

On Thu, Sep 03, 2009 at 09:29:08AM +0200, Alexander Graf wrote:
>
> On 03.09.2009, at 08:01, Muli Ben-Yehuda wrote:
>
>> On Wed, Sep 02, 2009 at 05:57:39PM +0200, Alexander Graf wrote:
>>>
>>> Am 02.09.2009 um 17:38 schrieb oritw@xxxxxxxxxx:
>>>
>>>> The following patches implement nested VMX support. The patches
>>>> enable a guest to use the VMX APIs in order to run its own nested
>>>> guest (i.e., enable running other hypervisors which use VMX under
>>>> KVM).
>>>
>>> Copl! Great job here. I was expecting vmcs load/stores to kill
>>> performance, but apparently I was wrong. How did you get those
>>> fast?
>>
>> Are you asking about vmptrld (switching the VMCS's) or the costs of
>> trapping vmread/vmwrites?
>
> vmprtld shouldn't really be too much of a problem. Just handle it as
> reset of the shadow vmcs and you're good.
>
> No, what I was wondering about was vmread & vmwrite. Those probably
> trap quite a lot and from what I've seen with nested SVM, trapping
> on the VMEXIT path is horribly slow, especially on shadow shadow
> paging, because you just get so many of them.

Nested EPT helps compared to shadow by removing many page fault exits
and their associated vmreads and vmwrites. Other than that I don't
recall we've done anything specific to reduce the overhead of vmreads
and vmwrites. Somewhat to our surprise, it turns out that with nested
EPT, given the cost of a single vmread and vmwrite on Nehalem class
machines, and more importantly the frequency and distribution of
vmreads and vmwrites, performance results are acceptable even with a
straightforward implementation. Having said that, for pathological
cases such as L2 workloads which are dominated by the L2 vmexit costs,
trapping on every L1 vmread and vmwrite will be horrendously
expensive.

>>>> SMP support was fixed.  Reworking EPT support to mesh cleanly
>>>> with the current shadow paging design per Avi's comments is a
>>>> work-in-progress.
>>>>
>>>> The current patches only support a single nested hypervisor
>>>
>>> Why?
>>
>> See above---no fundamental limitation---but needs more work. Bug
>> reports happily accepted, patches even more so :-)
>
> Well, maybe I understand the wording. Does "a single nested
> hypervisor" mean "one user of VMX per VCPU"?
>
> If so, it's only vmptrld that's not really well implemented.
>
> It does sound as if you only support one nested hypervisor
> throughout all VMs which wouldn't make sense, since all nested data
> should be vcpu local.

We only support one nested hypervisor throughout all VMs, but this is
a statement about what we've currently implemented and tested, not a
fundamental design limitation. Supporting multiple nested hypervisors
shouldn't be particularly difficult, except we might have made some
shortcuts such as using global data rather than vcpu local data that
will need to be fixed. It's on the roadmap :-)

>>> How about Hyper-V and Xen?
>>
>> We haven't tried them.
>
> It might be worth giving Xen a try. I found it being the second
> easiest target (after KVM).

Thanks, Xen is also on the roadmap but way down the list.

>>> Again, great job and congratulations on making this work!
>>
>> Thank you, your patches were very useful!
>
> It's good to see that they inspired you. In fact I even saw quite
> some structure resemblances in the source code :-).
>
> Will you be at the LPC where I'll be giving a talk about nested SVM?
>
> I'd love to get you on the stage so you get a chance of telling
> people that this even works for VMX. Last time I gave a talk on that
> topic I could merely say that no such thing existed.

Unfortunately I don't think anyone from Haifa will be there. Perhaps
Anthony or Mike (CC'd) will be there?

Cheers,
Muli
-- 
Muli Ben-Yehuda | muli@xxxxxxxxxx | +972-4-8281080
Manager, Virtualization and Systems Architecture
Master Inventor, IBM Haifa Research Laboratory
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html