Re: KVM call minutes for Sept 21

Chris Wright <chrisw@xxxxxxxxxx> · Tue, 21 Sep 2010 18:48:41 -0700

* Nadav Har'El (nyh@xxxxxxxxxxxxxxxxxxx) wrote:
> On Tue, Sep 21, 2010, Chris Wright wrote about "KVM call minutes for Sept 21":
> > Nested VMX
> > - looking for forward progress and better collaboration between the
> >   Intel and IBM teams
> 
> I'll be very happy if anyone, be it from Intel or somewhere else, would like
> to help me work on nested VMX.
> 
> Somebody (I don't recognize your voices yet, sorry...) mentioned on the call
> that there might not be much point in cooperation before I finish getting
> nested VMX merged into KVM.

My recollection...it was Avi.

> I agree, but my conclusion is different that what
> I think the speaker implied: My conclusion is that it is important that we
> merge the nested VMX code into KVM as soon as possible, because if nested VMX
> is part of KVM (and not a set of patches which becomes stale the moment after
> I release it) this will make it much easier for people to test it, use it,
> and cooperate in developing it.

Yup.  And especially for follow-on work (like nested EPT).  Makes sense
to merge and build from merged base rather than have out-of-tree patchset
continue to grow and grow.

> > - needs more review (not a new issue)
> 
> I think the reviews that nested VMX has received over the past year (thanks
> to Avi Kivity, Gleb Natapov, Eddie Dong and sometimes others), have been
> fantastic. You guys have shown deep understanding of the code, and found
> numerous bugs, oversights, missing features, and also a fair share of ugly
> code, and we (first Orit and Abel, and then I) have done are best to fix all
> of these issues. I've personally learned a lot from the latest round of
> reviews, and the discussions with you.
> 
> So I don't think there has been any lack of reviews. I don't think that
> getting more reviews is the most important task ahead of us.

At earlier points of review there were issues considered fundamental
that needed to be fixed before merging (SMP and proper VMPTRLD emulation
springs to mind).  Now it seems it's down to smaller, more targetted
issues.  Some hesitancy is based on the complexity of the patches.
So more review helps...test harness does too.  Anything to build Avi's
confidence to merging the code ;)

> Surely, if more people review the code, more potential bugs will be spotted.
> But this is always the case, with any software. I think the question now
> is, what would it take to finally declare the code as "good enough to be
> merged", with the understanding that even after being merged it will still be
> considered an experimental feature, disabled by default and documented as
> experimental. Nested SVM was also merged before it was perfect, and also
> KVM itself was released before being perfect :-)

;)

> > - use cases
> 
> I don't kid myself that as soon as nested VMX is available in KVM, millions
> of users worldwide will flock to use it. Definitely, many KVM users will never
> find a need for nested virtualization. But I do believe that there are many
> use cases. We outlined some of them in our paper (to be presented in a couple
> of weeks in OSDI):
> 
>   1. Hosting one of the new breed of operating systems which have a hypervisor
>      as part of them. Windows 7 with XP mode is one example. Linux with KVM
>      is another.
> 
>   2. Platforms with embedded hypervisors in firmware need nested virt to
>      run any workload - which can itself be a hypervisor with guests.
> 
>   3. Clouds users could put in their virtual machine a hypervisor with
>      sub-guests, and run multiple virtual machines on the one virtual machine
>      which they get.
> 
>   4. Enable live migration of entire hypervisors with their guests - for
>      load balancing, disaster recovery, and so on.
> 
>   5. Honeypots and protection against hypervisor-level rootkits
> 
>   6. Make it easier to test, demonstrate, benchmark and debug hypervisors,
>      and also entire virtualization setups. An entire virtualization setup
>      (hypervisor and all its guests) could be run as one virtual machine,
>      allowing testing many such setups on one physical machine.
> 
> By the way, I find the question of "why do we need nested VMX" a bit odd,
> seeing that KVM already supports nested virtualization (for SVM). Is it the
> case that nested virtualization was found useful on AMD processors, but for
> Intel processors, it isn't? Of course not :-) I think KVM should support
> nested virtualization on neither architecture, or on both - and of course
> I think it should be on both :-)

People keep looking for reasons to justify the cost of the effort, dunno
why "because it's cool" isn't good enough ;)  At any rate, that was mainly
a question of how it might be useful for production kind of environments.

> > - work todo
> >   - merge baseline patch
> >     - looks pretty good
> >     - review is finding mostly small things at this point
> >     - need some correctness verification (both review from Intel and testing)
> >   - need a test suite
> >     - test suite harness will help here
> >       - a few dozen nested SVM tests are there, can follow for nested VMX
> >   - nested EPT
>
> I've been keeping track of the issues remaining from the last review, and
> indeed only a few remain. Only 8 of the 24 patches have any outstanding
> issue, and I'm working on those that remain, as you could see on the mailing
> list in the last couple of weeks. If there's interest, I can even summarize
> these remaing issues.

If there are remaining issues that could be done by someone else, this
might be helpful.  Otherwise, probably only useful to you ;)

> But since I'm working on these patches alone, I think we need to define our
> priorities. Most of the outstanding review comments, while absolutely correct
> (and I was amazed by the quality of the reviewer's comments), deal with
> re-writing code that already works (to improve its style) or fixing relatively
> rare cases. It is not clear that these issues are more important than the
> other things listed in the summary above (test suite, nested EPT), but as
> long as I continue to rewrite pieces of the nested VMX code, I'll never get
> to those other important things.
> 
> To summarize, I'd love for us to define some sort of plan or roadmap on
> what we (or I) need to do before we can finally merge the nested VMX code
> into KVM. I would love for this roadmap to be relatively short, leaving
> some of the outstanding issues to be done after the merge.
> 
> >   - optimize (reduce vmreads and vmwrites)
> 
> Before we implemented nested VMX, we also feared that the exits on vmreads and
> vmwrites will kill the performance. As you can see in our paper (see preprint
> in http://nadav.harel.org.il/papers/nested-osdi10.pdf), we actually showed
> that this is not the case - while these extra exits do hurt performance,
> in common workloads (i.e., not pathological worst-case scenarios), the
> trapping vmread/vmwrite only moderately hurt performance. For example, with
> kernbench the nested overhead (over single-level virtualization) was 14.5%,
> which could have been reduced to 10.3% if vmread/vmwrite didn't trap.
> For the SPECjbb workloads, the numbers are 7.8% vs. 6.3%. As you can see,
> the numbers would be better if it weren't for the L1 vmread/vmwrites trapping,
> but the difference is not huge. Certainly we can start with a version that
> doesn't do anything about this issue.
> 
> So I don't think there is any urgent need to optimize nested VMX (the L0)
> or the behavior of KVM as L1. Of course, there's always a long-term desire
> to continue optimizing it.
> 
> > - has long term maintan
> 
> We have been maintaining this patch set for well over a year now, so I think
> we've shown long term interest in maintaining it, even across personel
> changes. In any case, it would have been much easier for us - and for other
> people - to maintain this patch if it was part of KVM, and we wouldn't need
> to take care of rebasing when KVM changes.

Sorry, I was typing too quickly.  That's a half-finished note which
should read:

  - has long term maintenance issues

And that means that there's two halves to the feature.  One is the nested
VMX code itself, for example each of new the EXIT_REASON_VM* handlers.
Other is glue to rest of KVM, for example, interrupt injection done
optimally.  Both have long term maintenance issues, but adding complexity
to core KVM was the context here.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html