Re: [PATCH v2] KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/06/19 20:55, Sean Christopherson wrote:
> KVM does not have 100% coverage of VMX consistency checks, i.e. some
> checks that cause VM-Fail may only be detected by hardware during a
> nested VM-Entry.  In such a case, KVM must restore L1's state to the
> pre-VM-Enter state as L2's state has already been loaded into KVM's
> software model.
> 
> L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*.  But
> when EPT is disabled, the associated fields hold KVM's shadow values,
> not L1's "real" values.  Fortunately, when EPT is disabled the PDPTRs
> come from memory, i.e. are not cached in the VMCS.  Which leaves CR3
> as the sole anomaly.
> 
> A previously applied workaround to handle CR3 was to force nested early
> checks if EPT is disabled:
> 
>   commit 2b27924bb1d48 ("KVM: nVMX: always use early vmcs check when EPT
>                          is disabled")
> 
> Forcing nested early checks is undesirable as doing so adds hundreds of
> cycles to every nested VM-Entry.  Rather than take this performance hit,
> handle CR3 by overwriting vmcs01.GUEST_CR3 with L1's CR3 during nested
> VM-Entry when EPT is disabled *and* nested early checks are disabled.
> By stuffing vmcs01.GUEST_CR3, nested_vmx_restore_host_state() will
> naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.
> 
> These shenanigans work because nested_vmx_restore_host_state() does a
> full kvm_mmu_reset_context(), i.e. unloads the current MMU, which
> guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow CR3
> prior to re-entering L1.
> 
> vcpu->arch.root_mmu.root_hpa is set to INVALID_PAGE via:
> 
>     nested_vmx_restore_host_state() ->
>         kvm_mmu_reset_context() ->
>             kvm_mmu_unload() ->
>                 kvm_mmu_free_roots()
> 
> kvm_mmu_unload() has WARN_ON(root_hpa != INVALID_PAGE), i.e. we can bank
> on 'root_hpa == INVALID_PAGE' unless the implementation of
> kvm_mmu_reset_context() is changed.
> 
> On the way into L1, VMCS.GUEST_CR3 is guaranteed to be written (on a
> successful entry) via:
> 
>     vcpu_enter_guest() ->
>         kvm_mmu_reload() ->
>             kvm_mmu_load() ->
>                 kvm_mmu_load_cr3() ->
>                     vmx_set_cr3()
> 
> Stuff vmcs01.GUEST_CR3 if and only if nested early checks are disabled
> as a "late" VM-Fail should never happen win that case (KVM WARNs), and
> the conditional write avoids the need to restore the correct GUEST_CR3
> when nested_vmx_check_vmentry_hw() fails.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> ---
> 
> v2: Squashed the revert of the previous workaround into this patch
>     and added a beefy comment [Paolo].
> 
>  arch/x86/include/uapi/asm/vmx.h |  1 -
>  arch/x86/kvm/vmx/nested.c       | 44 +++++++++++++++++----------------
>  2 files changed, 23 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
> index d213ec5c3766..f0b0c90dd398 100644
> --- a/arch/x86/include/uapi/asm/vmx.h
> +++ b/arch/x86/include/uapi/asm/vmx.h
> @@ -146,7 +146,6 @@
>  
>  #define VMX_ABORT_SAVE_GUEST_MSR_FAIL        1
>  #define VMX_ABORT_LOAD_HOST_PDPTE_FAIL       2
> -#define VMX_ABORT_VMCS_CORRUPTED             3
>  #define VMX_ABORT_LOAD_HOST_MSR_FAIL         4
>  
>  #endif /* _UAPIVMX_H */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0f705c7d590c..5c6f37edb16c 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2964,6 +2964,25 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
>  		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
>  
> +	/*
> +	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
> +	 * nested early checks are disabled.  In the event of a "late" VM-Fail,
> +	 * i.e. a VM-Fail detected by hardware but not KVM, KVM must unwind its
> +	 * software model to the pre-VMEntry host state.  When EPT is disabled,
> +	 * GUEST_CR3 holds KVM's shadow CR3, not L1's "real" CR3, which causes
> +	 * nested_vmx_restore_host_state() to corrupt vcpu->arch.cr3.  Stuffing
> +	 * vmcs01.GUEST_CR3 results in the unwind naturally setting arch.cr3 to
> +	 * the correct value.  Smashing vmcs01.GUEST_CR3 is safe because nested
> +	 * VM-Exits, and the unwind, reset KVM's MMU, i.e. vmcs01.GUEST_CR3 is
> +	 * guaranteed to be overwritten with a shadow CR3 prior to re-entering
> +	 * L1.  Don't stuff vmcs01.GUEST_CR3 when using nested early checks as
> +	 * KVM modifies vcpu->arch.cr3 if and only if the early hardware checks
> +	 * pass, and early VM-Fails do not reset KVM's MMU, i.e. the VM-Fail
> +	 * path would need to manually save/restore vmcs01.GUEST_CR3.
> +	 */
> +	if (!enable_ept && !nested_early_check)
> +		vmcs_writel(GUEST_CR3, vcpu->arch.cr3);
> +
>  	vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
>  
>  	prepare_vmcs02_early(vmx, vmcs12);
> @@ -3775,18 +3794,8 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
>  	vmx_set_cr4(vcpu, vmcs_readl(CR4_READ_SHADOW));
>  
>  	nested_ept_uninit_mmu_context(vcpu);
> -
> -	/*
> -	 * This is only valid if EPT is in use, otherwise the vmcs01 GUEST_CR3
> -	 * points to shadow pages!  Fortunately we only get here after a WARN_ON
> -	 * if EPT is disabled, so a VMabort is perfectly fine.
> -	 */
> -	if (enable_ept) {
> -		vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> -		__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
> -	} else {
> -		nested_vmx_abort(vcpu, VMX_ABORT_VMCS_CORRUPTED);
> -	}
> +	vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> +	__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
>  
>  	/*
>  	 * Use ept_save_pdptrs(vcpu) to load the MMU's cached PDPTRs
> @@ -3794,7 +3803,8 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
>  	 * VMFail, like everything else we just need to ensure our
>  	 * software model is up-to-date.
>  	 */
> -	ept_save_pdptrs(vcpu);
> +	if (enable_ept)
> +		ept_save_pdptrs(vcpu);
>  
>  	kvm_mmu_reset_context(vcpu);
>  
> @@ -5726,14 +5736,6 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
>  {
>  	int i;
>  
> -	/*
> -	 * Without EPT it is not possible to restore L1's CR3 and PDPTR on
> -	 * VMfail, because they are not available in vmcs01.  Just always
> -	 * use hardware checks.
> -	 */
> -	if (!enable_ept)
> -		nested_early_check = 1;
> -
>  	if (!cpu_has_vmx_shadow_vmcs())
>  		enable_shadow_vmcs = 0;
>  	if (enable_shadow_vmcs) {
> 

Queued, thanks.

Paolo



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux