Re: KVM call minutes for Sept 21

"Nadav Har'El" <nyh@xxxxxxxxxxxxxxxxxxx> · Wed, 22 Sep 2010 02:04:38 +0200

Hi, thanks for the summary.
I also listened-in on the call. I'm glad these issues are being discussed.

On Tue, Sep 21, 2010, Chris Wright wrote about "KVM call minutes for Sept 21":
> Nested VMX
> - looking for forward progress and better collaboration between the
>   Intel and IBM teams

I'll be very happy if anyone, be it from Intel or somewhere else, would like
to help me work on nested VMX.

Somebody (I don't recognize your voices yet, sorry...) mentioned on the call
that there might not be much point in cooperation before I finish getting
nested VMX merged into KVM. I agree, but my conclusion is different that what
I think the speaker implied: My conclusion is that it is important that we
merge the nested VMX code into KVM as soon as possible, because if nested VMX
is part of KVM (and not a set of patches which becomes stale the moment after
I release it) this will make it much easier for people to test it, use it,
and cooperate in developing it.

> - needs more review (not a new issue)

I think the reviews that nested VMX has received over the past year (thanks
to Avi Kivity, Gleb Natapov, Eddie Dong and sometimes others), have been
fantastic. You guys have shown deep understanding of the code, and found
numerous bugs, oversights, missing features, and also a fair share of ugly
code, and we (first Orit and Abel, and then I) have done are best to fix all
of these issues. I've personally learned a lot from the latest round of
reviews, and the discussions with you.

So I don't think there has been any lack of reviews. I don't think that
getting more reviews is the most important task ahead of us.

Surely, if more people review the code, more potential bugs will be spotted.
But this is always the case, with any software. I think the question now
is, what would it take to finally declare the code as "good enough to be
merged", with the understanding that even after being merged it will still be
considered an experimental feature, disabled by default and documented as
experimental. Nested SVM was also merged before it was perfect, and also
KVM itself was released before being perfect :-)

> - use cases

I don't kid myself that as soon as nested VMX is available in KVM, millions
of users worldwide will flock to use it. Definitely, many KVM users will never
find a need for nested virtualization. But I do believe that there are many
use cases. We outlined some of them in our paper (to be presented in a couple
of weeks in OSDI):

  1. Hosting one of the new breed of operating systems which have a hypervisor
     as part of them. Windows 7 with XP mode is one example. Linux with KVM
     is another.

  2. Platforms with embedded hypervisors in firmware need nested virt to
     run any workload - which can itself be a hypervisor with guests.

  3. Clouds users could put in their virtual machine a hypervisor with
     sub-guests, and run multiple virtual machines on the one virtual machine
     which they get.

  4. Enable live migration of entire hypervisors with their guests - for
     load balancing, disaster recovery, and so on.

  5. Honeypots and protection against hypervisor-level rootkits

  6. Make it easier to test, demonstrate, benchmark and debug hypervisors,
     and also entire virtualization setups. An entire virtualization setup
     (hypervisor and all its guests) could be run as one virtual machine,
     allowing testing many such setups on one physical machine.

By the way, I find the question of "why do we need nested VMX" a bit odd,
seeing that KVM already supports nested virtualization (for SVM). Is it the
case that nested virtualization was found useful on AMD processors, but for
Intel processors, it isn't? Of course not :-) I think KVM should support
nested virtualization on neither architecture, or on both - and of course
I think it should be on both :-)

> - work todo
>   - merge baseline patch
>     - looks pretty good
>     - review is finding mostly small things at this point
>     - need some correctness verification (both review from Intel and testing)
>   - need a test suite
>     - test suite harness will help here
>       - a few dozen nested SVM tests are there, can follow for nested VMX
>   - nested EPT

I've been keeping track of the issues remaining from the last review, and
indeed only a few remain. Only 8 of the 24 patches have any outstanding
issue, and I'm working on those that remain, as you could see on the mailing
list in the last couple of weeks. If there's interest, I can even summarize
these remaing issues.

But since I'm working on these patches alone, I think we need to define our
priorities. Most of the outstanding review comments, while absolutely correct
(and I was amazed by the quality of the reviewer's comments), deal with
re-writing code that already works (to improve its style) or fixing relatively
rare cases. It is not clear that these issues are more important than the
other things listed in the summary above (test suite, nested EPT), but as
long as I continue to rewrite pieces of the nested VMX code, I'll never get
to those other important things.

To summarize, I'd love for us to define some sort of plan or roadmap on
what we (or I) need to do before we can finally merge the nested VMX code
into KVM. I would love for this roadmap to be relatively short, leaving
some of the outstanding issues to be done after the merge.

>   - optimize (reduce vmreads and vmwrites)

Before we implemented nested VMX, we also feared that the exits on vmreads and
vmwrites will kill the performance. As you can see in our paper (see preprint
in http://nadav.harel.org.il/papers/nested-osdi10.pdf), we actually showed
that this is not the case - while these extra exits do hurt performance,
in common workloads (i.e., not pathological worst-case scenarios), the
trapping vmread/vmwrite only moderately hurt performance. For example, with
kernbench the nested overhead (over single-level virtualization) was 14.5%,
which could have been reduced to 10.3% if vmread/vmwrite didn't trap.
For the SPECjbb workloads, the numbers are 7.8% vs. 6.3%. As you can see,
the numbers would be better if it weren't for the L1 vmread/vmwrites trapping,
but the difference is not huge. Certainly we can start with a version that
doesn't do anything about this issue.

So I don't think there is any urgent need to optimize nested VMX (the L0)
or the behavior of KVM as L1. Of course, there's always a long-term desire
to continue optimizing it.

> - has long term maintan

We have been maintaining this patch set for well over a year now, so I think
we've shown long term interest in maintaining it, even across personel
changes. In any case, it would have been much easier for us - and for other
people - to maintain this patch if it was part of KVM, and we wouldn't need
to take care of rebasing when KVM changes.

Thanks,
Nadav.

-- 
Nadav Har'El                        |   Wednesday, Sep 22 2010, 14 Tishri 5771
nyh@xxxxxxxxxxxxxxxxxxx             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Linux *is* user-friendly. Not
http://nadav.harel.org.il           |idiot-friendly, but user-friendly.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html