Re: [PATCH] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 24, 2016 at 7:25 PM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
> 2016-11-24 14:29+0100, Ladi Prosek:
>> KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
>> PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
>> PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
>> (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always
>> dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two
>> related issues:
>>
>> 1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are
>> overwritten in ept_load_pdptrs because the registers are believed to have
>> been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is
>> running with PAE enabled but PDPTRs have been set up by L1.
>>
>> 2) When L2 is about to enable paging and loads its CR3, we, again, attempt
>> to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees
>> that this will succeed (it's just a CR3 load, paging is not enabled yet) and
>> if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is
>> then lost and L2 crashes right after it enables paging.
>>
>> This patch replaces the kvm_set_cr3 call with a simple register write if PAE
>> and EPT are both on. CR3 is not to be interpreted in this case.
>>
>> Signed-off-by: Ladi Prosek <lprosek@xxxxxxxxxx>
>> ---
>>
>> tl;dr This makes Hyper-V (Windows Server 2016) work on KVM.
>
> Great work!
>
>> @@ -10113,8 +10115,17 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>       vmx_set_cr4(vcpu, vmcs12->guest_cr4);
>>       vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12));
>>
>> -     /* shadow page tables on either EPT or shadow page tables */
>> -     kvm_set_cr3(vcpu, vmcs12->guest_cr3);
>> +     /*
>> +      * Shadow page tables on either EPT or shadow page tables.
>> +      * If PAE and EPT are both on, CR3 is not used by the CPU and must not
>> +      * be dereferenced.
>> +      */
>> +     if (is_pae(vcpu) && is_paging(vcpu) && nested_ept_enabled) {
>
> The spec also has a "&& !is_long_mode(vcpu)" condition for PAE paging.
> I'm pretty sure that is_pae() and is_paging() is set in long mode ...
> the stuff in kvm_set_cr3() is not needed even then?

Good catch, yes the spec also has !is_long_mode and it should be here
for completeness. It won't make much of a difference though, I believe
that kvm_set_cr3 is to a large extent unnecessary here.

What kvm_set_cr3 does:

* conditional  kvm_mmu_sync_roots + kvm_make_request(KVM_REQ_TLB_FLUSH)
** kvm_mmu_sync_roots will run anyway: kvm_mmu_load <- kvm_mmu_reload
<- vcpu_enter_guest
** tlb flush will be done anyway: vmx_flush_tlb <- vmx_set_cr3 <-
kvm_mmu_load <- kvm_mmu_reload <- vcpu_enter_guest

* in long mode, it fails if (cr3 & CR3_L_MODE_RESERVED_BITS)
** nobody checks the return value
** Intel manual says "Reserved bits in CR0 and CR3 remain clear after
any load of those registers; attempts to set them have no impact."
Should we just clear the bits and carry on then? This is in conflict
with "#GP(0) .. If an attempt is made to write a 1 to any reserved bit
in CR3." Hmm.

* if (is_pae(vcpu) && is_paging(vcpu), load_pdptrs is executed
** seems legit!

* kvm_mmu_new_cr3
** will run anyway, handled in kvm_mmu_reset_context called right
after kvm_set_cr3

I am tempted to make more invasive changes. What do you think?

>> +             vcpu->arch.cr3 = vmcs12->guest_cr3;
>> +             __set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
>> +     } else
>> +             kvm_set_cr3(vcpu, vmcs12->guest_cr3);
>> +
>>       kvm_mmu_reset_context(vcpu);
>>
>>       if (!enable_ept)
>> --
>> 2.7.4
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux