This series of patches implements shadow-vmcs capability for nested VMX. Shadow-vmcs - background and overview: In Intel VMX, vmread and vmwrite privileged instructions are used by the hypervisor to read and modify the guest and host specifications (VMCS). In a nested virtualization environment, L1 executes multiple vmread and vmwrite instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1 traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction behaviour and resumes L1 execution. Removing the need to trap and emulate these special instructions reduces the number of exits and improves nested virtualization performance. As it was first evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization overhead up-to 40%. Intel introduced a new feature to their processors called shadow-vmcs. Using shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode access VMCS12 fields using vmread and vmwrite instructions but without causing an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled by L0. Shadow-vmcs - design considerations: A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using vmread and vmwrite instructions. With nested virtualization we aim to abstract the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we prefered to keep the software defined VMCS12 format as part of L1 address space and hold the processor-specific shadow-vmcs format only in L0 address space. In other words, the shadow-vmcs is used by L0 as an accelerator but the format and content is never exposed to L1 directly. L0 syncs the content of the processor-specific shadow vmcs with the content of the software-controlled VMCS12 format. We could have been kept the processor-specific shadow-vmcs format in L1 address space to avoid using the software defined VMCS12 format, however, this type of design/implementation would have been created hardware dependencies and would complicate other capabilities (e.g. Live Migration of L1). Changes since v1: 1) Added sync_shadow_vmcs flag used to indicate when the content of VMCS12 must be copied to the shadow vmcs. The flag value is checked during vmx_vcpu_run. 2) Code quality improvements Changes since v2: 1) Allocate shadow vmcs only once per VCPU on handle_vmxon and re-use the same instance for multiple VMCS12s 2) More code quality improvements Changes since v3: 1) Fixed VMXON emulation (new patch). Previous nVMX code didn't verify if L1 is already in root mode (VMXON was previously called). Now we call nested_vmx_failValid if VMX is already ON. This is requird to avoid host leaks (due to shadow vmcs allocation) if L1 repetedly executes VMXON. 2) Improved comment: clarified we do not shadow fields that are modified when L1 executes vmx instructions like the VM_INSTRUCTION_ERROR field. Changes since v4: 1) Fixed free_nested: we now free the shadow vmcs also when there is no current vmcs. Acknowledgments: Many thanks to "Natapov, Gleb" <gleb@xxxxxxxxxx> "Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx> "Nakajima, Jun" <jun.nakajima@xxxxxxxxx> "Har'El, Nadav" <nadav@xxxxxxxxxxxx> for the insightful discussions, comments and reviews. These patches were easily created and maintained using Patchouli -- patch creator http://patchouli.sourceforge.net/ [1] "The Turtles Project: Design and Implementation of Nested Virtualization", http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html