Re: [PATCH] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2016-11-28 19:12+0100, Ladi Prosek:
> On Fri, Nov 25, 2016 at 3:15 PM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
> > 2016-11-25 09:44+0100, Ladi Prosek:
>>> What kvm_set_cr3 does:
>>>
>>> * conditional  kvm_mmu_sync_roots + kvm_make_request(KVM_REQ_TLB_FLUSH)
>>> ** kvm_mmu_sync_roots will run anyway: kvm_mmu_load <- kvm_mmu_reload
>>> <- vcpu_enter_guest
>>> ** tlb flush will be done anyway: vmx_flush_tlb <- vmx_set_cr3 <-
>>> kvm_mmu_load <- kvm_mmu_reload <- vcpu_enter_guest
>>>
>>> * in long mode, it fails if (cr3 & CR3_L_MODE_RESERVED_BITS)
>>> ** nobody checks the return value
>>> ** Intel manual says "Reserved bits in CR0 and CR3 remain clear after
>>> any load of those registers; attempts to set them have no impact."
>>> Should we just clear the bits and carry on then? This is in conflict
>>> with "#GP(0) .. If an attempt is made to write a 1 to any reserved bit
>>> in CR3." Hmm.
>>
>> The spec is quite clear on this.  26.3.1.1 Checks on Guest Control
>> Registers, Debug Registers, and MSRs:
>>
>>   The following checks are performed on processors that support Intel 64
>>   architecture: The CR3 field must be such that bits 63:52 and bits in
>>   the range 51:32 beyond the processor’s physical-address width are 0.
>>
>> To verify, I tried these two options on top of vmx_vcpu_run
>>
>>   vmcs_writel(GUEST_CR3, vmcs_readl(GUEST_CR3) | 1UL << boot_cpu_data.x86_phys_bits);
>>   vmcs_writel(GUEST_CR3, vmcs_readl(GUEST_CR3) | CR3_PCID_INVD);
>>
>> and both failed VM entry.  We should fail the nested VM entry as well
>> and use cpuid_maxphyaddr() to determine when.
> 
> Thanks, I hadn't realized that the rules for VM entry are different
> from regular CR3 loads. One more reason for not using kvm_set_cr3
> here.
> 
>> (And I have a bad feeling that guest's physical address width is not
>>  being limited by hardware's ...)
> 
> Can you elaborate? In which MMU modes would it be causing problems?

It's a corner case on hardware that has less physical bits than the
guest is configure for.

If L1 then sets bits between its maximum and the hardware maximum, then
VM entry in L0 will fail and that will kill L1 (report hardware error to
userspace).  L1 did nothing wrong and the bug is in L0, so killing L1 if
we hit the corner case is the best way ...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux