[PATCH 0/10] nEPT v2: Nested EPT support for Nested VMX

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The following patches add nested EPT support to Nested VMX.

This is the second version of this patch set. Most of the issues from the
previous reviews were handled, and in particular there is now a new variant
of paging_tmpl for EPT page tables.

However, while this version does work in my tests, there are still some known
problems/bugs with this version and unhandled issues from the previous review:

 1. 32-bit *PAE* L2s currently don't work. non-PAE 32-bit L2s do work
    (and so do, of course, 64-bit L2s).

 2. nested_ept_inject_page_fault() assumes vm_exit_reason is already set
    to EPT_VIOLATION. However, it is conceivable that L0 emulates some
    L2 instruction, and during this emulation we read some L2 memory
    causing a need to exit (from L2 to L1) with an EPT violation.

 3. Moreover, now nested_ept_inject_page_fault() always causes an
    EPT_VIOLATION, with vmcs12->exit_qualification = fault->error_code.
    This is wrong: first fault->error code is not in exit qualification
    format but in PFERR_* format. Moreover, PFERR_RSVD_MASK would mean
    we need to cause an EPT_MISCONFIG, NOT EPT_VIOLATION.
    Instead of trying to fix this by translating PFERR to exit_qualification,
    we should calculate and remember in walk_addr() the exit qualification
    (and and an additional bit: whether it's an EPT VIOLATION or
    MISCONFIGURATION). This will be remembered in new fields in x86_exception.

    Avi suggested: "[add to x86_exception] another bool, to distinguish
    between EPT VIOLATION and EPT_QUALIFICATION. The error_code field should
    be extended to 64 bits for EXIT_QUALIFICATION (though only bits 0-12 are
    defined). You need another field for the guest linear address. 
    EXIT_QUALIFICATION has to be calculated, it cannot be derived from the
    original exit. Look at kvm_propagate_fault()."
    He also added: "If we're injecting an EPT VIOLATION to L1 (because we
    weren't able to resolve it; say L1 write-protected the page), then we
    need to compute EXIT_QUALIFICATION.  Bits 3-5 of EXIT_QUALIFICATION are
    computed from EPT12 paging structure entries (easy to derive them from
    pt_access/pte_access)."

 4. Also, nested_ept_inject_page_fault() doesn't set guest linear address.
 
 5. There are several "TODO"s left in the code.

If there's any volunteer willing to help me with some of these issues,
it would be great :-)


About nested EPT:
-----------------

Nested EPT means emulating EPT for an L1 guest, allowing it to use EPT when
running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set
its own cr3 and take its own page faults without either of L0 or L1 getting
involved. In many workloads this significanlty improves L2's performance over
the previous two alternatives (shadow page tables over ept, and shadow page
tables over shadow page tables). Our paper [1] described these three options,
and the advantages of nested EPT ("multidimensional paging" in the paper).

Nested EPT is enabled by default (if the hardware supports EPT), so users do
not have to do anything special to enjoy the performance improvement that
this patch gives to L2 guests. L1 may of course choose not to use nested
EPT, by simply not using EPT (e.g., a KVM in L1 may use the "ept=0" option).

Just as a non-scientific, non-representative indication of the kind of
dramatic performance improvement you may see in workloads that have a lot of
context switches and page faults, here is a measurement of the time
an example single-threaded "make" took in L2 (kvm over kvm):

 shadow over shadow: 105 seconds
 ("ept=0" in L0 forces this)

 shadow over EPT: 87 seconds
 (the previous default; Can be forced with "ept=0" in L1)

 EPT over EPT: 29 seconds
 (the default after this patch)

Note that the same test on L1 (with EPT) took 25 seconds, so for this example
workload, performance of nested virtualization is now very close to that of
single-level virtualization.


[1] "The Turtles Project: Design and Implementation of Nested Virtualization",
    http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf


Patch statistics:
-----------------

 Documentation/virtual/kvm/nested-vmx.txt |    4 
 arch/x86/include/asm/vmx.h               |    2 
 arch/x86/kvm/mmu.c                       |   52 +++-
 arch/x86/kvm/mmu.h                       |    1 
 arch/x86/kvm/paging_tmpl.h               |   98 ++++++++-
 arch/x86/kvm/vmx.c                       |  227 +++++++++++++++++++--
 arch/x86/kvm/x86.c                       |   11 -
 7 files changed, 354 insertions(+), 41 deletions(-)

--
Nadav Har'El
IBM Haifa Research Lab

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux