On Wed, Jun 23, 2010, Avi Kivity wrote about "Re: [PATCH 5/24] Introduce vmcs12: a VMCS structure for L1": > >+ > >+We describe in much greater detail the theory behind the nested VMX > >feature, > >+its implementation and its performance characteristics, in IBM Research > >report > >+H-0282, "The Turtles Project: Design and Implementation of Nested > >+Virtualization", available at: > >+ > >+ http://bit.ly/a0o9te > > Please put the true url in here. Done. By the way, since I wrote this, our paper has also been accepted to OSDI 2010 (see http://www.usenix.org/events/osdi10/tech/), so later I will change the link again to the conference paper. > >+The current code support running Linux under a nested KVM using shadow > >+page table (with bypass_guest_pf disabled). > > Might as well remove this, since nvmx will not be merged with such a > gaping hole. > > In theory I ought to reject anything that doesn't comply with the spec. > In practice I'll accept deviations from the spec, so long as > > - those features aren't used by common guests > - when the features are attempted to be used, kvm will issue a warning Ok, I plugged the big gaping hole and left a small invisible hole ;-) The situation now is that you no longer have to run kvm with bypass_guest_pf, not on L0 and not on L1. L1 guests will run normally, possibly with bypass_guest_pf enabled. However, when L2 guests run every page-fault will cause an exit - regardless of what L0 or L1 tried to define via PFEC_MASK, PFEC_MATCH and EB[pf]. The reason why I said there is a "small hole" left is that now there is the possibility that we inject L1 with a page fault that it didn't expect to get. But in practice, this does not seem to cause any problems for neither KVM nor VMWare Server. > I don't think PFEC matching ought to present any implementation difficulty. Well, it is more complicated than it first appeared (at least to me). One problem is that there is no real way (at least none that I thought of) to "or" the pf-trapping desires of L0 and L1. I solved this by traping all page faults, which is unfortunate. The second problem, related to the first one, when L0 gets a page fault while running L2, it is now quite diffcult to figure out whether it should be injected into L1, i.e., whether L1 asked for this specific page-fault trap to happen. We need check whether the page_fault_error_code match's the L1-specified pfec_mask and pfec_match (and eb.pf), but it's actually more complicated, because the page_fault_error_code we got from the processor refers to the shadow page tables, and we need to translate it back to what it would mean for L1's page tables. Doing this correctly would require me to spend quite a bit more time to understand exactly how the shadow page tables code works, and I hesitate whether I should do that now, when I know that common guest hypervisors work perfectly without fixing this issue, and when most people would rather use EPT and not shadow page tables anyway. In any case, I left a TODO in the code about this, so it won't be forgotten. -- Nadav Har'El | Sunday, Aug 8 2010, 28 Av 5770 nyh@xxxxxxxxxxxxxxxxxxx |----------------------------------------- Phone +972-523-790466, ICQ 13349191 |It's no use crying over spilt milk -- it http://nadav.harel.org.il |only makes it salty for the cat. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html