Re: [PATCH] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
> PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
> PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
> (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always
> dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two
> related issues:
> 
> 1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are
> overwritten in ept_load_pdptrs because the registers are believed to have
> been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is
> running with PAE enabled but PDPTRs have been set up by L1.
> 
> 2) When L2 is about to enable paging and loads its CR3, we, again, attempt
> to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no
> guarantees
> that this will succeed (it's just a CR3 load, paging is not enabled yet) and
> if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is
> then lost and L2 crashes right after it enables paging.
> 
> This patch replaces the kvm_set_cr3 call with a simple register write if PAE
> and EPT are both on. CR3 is not to be interpreted in this case.
> 
> Signed-off-by: Ladi Prosek <lprosek@xxxxxxxxxx>

Wow, that's weird!  And a very nice analysis.

Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

Thanks,

Paolo

> ---
> 
> tl;dr This makes Hyper-V (Windows Server 2016) work on KVM.
> 
>  arch/x86/kvm/vmx.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 81fbda0..d4ad6a9 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9808,6 +9808,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu,
> struct vmcs12 *vmcs12)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	u32 exec_control;
> +	u32 nested_ept_enabled = 0;
>  
>  	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>  	vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> @@ -9972,6 +9973,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu,
> struct vmcs12 *vmcs12)
>  				vmcs12->guest_intr_status);
>  		}
>  
> +		nested_ept_enabled = exec_control & SECONDARY_EXEC_ENABLE_EPT;
>  		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
>  	}
>  
> @@ -10113,8 +10115,17 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu,
> struct vmcs12 *vmcs12)
>  	vmx_set_cr4(vcpu, vmcs12->guest_cr4);
>  	vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12));
>  
> -	/* shadow page tables on either EPT or shadow page tables */
> -	kvm_set_cr3(vcpu, vmcs12->guest_cr3);
> +	/*
> +	 * Shadow page tables on either EPT or shadow page tables.
> +	 * If PAE and EPT are both on, CR3 is not used by the CPU and must not
> +	 * be dereferenced.
> +	 */
> +	if (is_pae(vcpu) && is_paging(vcpu) && nested_ept_enabled) {
> +		vcpu->arch.cr3 = vmcs12->guest_cr3;
> +		__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
> +	} else
> +		kvm_set_cr3(vcpu, vmcs12->guest_cr3);
> +
>  	kvm_mmu_reset_context(vcpu);
>  
>  	if (!enable_ept)
> --
> 2.7.4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux