Re: [PATCH 5/24] Introduce vmcs12: a VMCS structure for L1

"Nadav Har'El" <nyh@xxxxxxxxxxxxxxxxxxx> · Sun, 8 Aug 2010 18:09:28 +0300

On Wed, Jun 23, 2010, Avi Kivity wrote about "Re: [PATCH 5/24] Introduce vmcs12: a VMCS structure for L1":
> >+
> >+We describe in much greater detail the theory behind the nested VMX 
> >feature,
> >+its implementation and its performance characteristics, in IBM Research 
> >report
> >+H-0282, "The Turtles Project: Design and Implementation of Nested
> >+Virtualization", available at:
> >+
> >+        http://bit.ly/a0o9te
> 
> Please put the true url in here.

Done.
By the way, since I wrote this, our paper has also been accepted to OSDI 2010
(see http://www.usenix.org/events/osdi10/tech/), so later I will change the
link again to the conference paper.

> >+The current code support running Linux under a nested KVM using shadow
> >+page table (with bypass_guest_pf disabled).
> 
> Might as well remove this, since nvmx will not be merged with such a 
> gaping hole.
> 
> In theory I ought to reject anything that doesn't comply with the spec.  
> In practice I'll accept deviations from the spec, so long as
> 
> - those features aren't used by common guests
> - when the features are attempted to be used, kvm will issue a warning

Ok, I plugged the big gaping hole and left a small invisible hole ;-)

The situation now is that you no longer have to run kvm with bypass_guest_pf,
not on L0 and not on L1. L1 guests will run normally, possibly with
bypass_guest_pf enabled. However, when L2 guests run every page-fault will
cause an exit - regardless of what L0 or L1 tried to define via
PFEC_MASK, PFEC_MATCH and EB[pf].

The reason why I said there is a "small hole" left is that now there is the
possibility that we inject L1 with a page fault that it didn't expect to get.
But in practice, this does not seem to cause any problems for neither KVM
nor VMWare Server.

> I don't think PFEC matching ought to present any implementation difficulty.

Well, it is more complicated than it first appeared (at least to me).
One problem is that there is no real way (at least none that I thought of)
to "or" the pf-trapping desires of L0 and L1. I solved this by  traping all
page faults, which is unfortunate. The second problem, related to the first
one, when L0 gets a page fault while running L2, it is now quite diffcult to
figure out whether it should be injected into L1, i.e., whether L1 asked
for this specific page-fault trap to happen. We need check whether the
page_fault_error_code match's the L1-specified pfec_mask and pfec_match
(and eb.pf), but it's actually more complicated, because the
page_fault_error_code we got from the processor refers to the shadow page
tables, and we need to translate it back to what it would mean for L1's page
tables.

Doing this correctly would require me to spend quite a bit more time to
understand exactly how the shadow page tables code works, and I hesitate
whether I should do that now, when I know that common guest hypervisors
work perfectly without fixing this issue, and when most people would rather
use EPT and not shadow page tables anyway.

In any case, I left a TODO in the code about this, so it won't be forgotten.

-- 
Nadav Har'El                        |          Sunday, Aug  8 2010, 28 Av 5770
nyh@xxxxxxxxxxxxxxxxxxx             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |It's no use crying over spilt milk -- it
http://nadav.harel.org.il           |only makes it salty for the cat.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html