Re: [PATCH 5/24] Introduce vmcs12: a VMCS structure for L1

Avi Kivity <avi@xxxxxxxxxx> · Mon, 09 Aug 2010 23:24:43 -0400

 On 08/08/2010 11:09 AM, Nadav Har'El wrote:

+page table (with bypass_guest_pf disabled).
Might as well remove this, since nvmx will not be merged with such a
gaping hole.

In theory I ought to reject anything that doesn't comply with the spec.
In practice I'll accept deviations from the spec, so long as

- those features aren't used by common guests
- when the features are attempted to be used, kvm will issue a warning
Ok, I plugged the big gaping hole and left a small invisible hole ;-)

The situation now is that you no longer have to run kvm with bypass_guest_pf,
not on L0 and not on L1. L1 guests will run normally, possibly with
bypass_guest_pf enabled. However, when L2 guests run every page-fault will
cause an exit - regardless of what L0 or L1 tried to define via
PFEC_MASK, PFEC_MATCH and EB[pf].

The reason why I said there is a "small hole" left is that now there is the
possibility that we inject L1 with a page fault that it didn't expect to get.
But in practice, this does not seem to cause any problems for neither KVM
nor VMWare Server.

Not nice, but acceptable.  Spurious page faults are accepted by guests 
since they're often the result of concurrent faults on the same address.

I don't think PFEC matching ought to present any implementation difficulty.
Well, it is more complicated than it first appeared (at least to me).
One problem is that there is no real way (at least none that I thought of)
to "or" the pf-trapping desires of L0 and L1.

If they use the same "sense" (bit 14 of EXCEPTION_BITMAP), you can AND 
the two PFEC_MASKs, and drop any bits remaining where PFEC_MATCH is 
different.  Not worth it, probably.

  I solved this by  traping all
page faults, which is unfortunate. The second problem, related to the first
one, when L0 gets a page fault while running L2, it is now quite diffcult to
figure out whether it should be injected into L1, i.e., whether L1 asked
for this specific page-fault trap to happen. We need check whether the
page_fault_error_code match's the L1-specified pfec_mask and pfec_match
(and eb.pf), but it's actually more complicated, because the
page_fault_error_code we got from the processor refers to the shadow page
tables, and we need to translate it back to what it would mean for L1's page
tables.

You can recover original PFEC by doing a walk_addr().

Doing this correctly would require me to spend quite a bit more time to
understand exactly how the shadow page tables code works, and I hesitate
whether I should do that now, when I know that common guest hypervisors
work perfectly without fixing this issue, and when most people would rather
use EPT and not shadow page tables anyway.

In any case, I left a TODO in the code about this, so it won't be forgotten.

Sure.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html