On 22/03/2017 14:44, Wanpeng Li wrote: > 2017-03-22 20:00 GMT+08:00 Ladi Prosek <lprosek@xxxxxxxxxx>: >> On Sat, Mar 18, 2017 at 7:37 AM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: >>> 2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@xxxxxxxxxx>: >>>> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: >>>>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> >>>>> >>>>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that >>>>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly: >>>>> >>>>> qemu-system-x86-2821 [003] d..2 45.848814: kvm_entry: vcpu 0 >>>>> qemu-system-x86-2821 [003] ...1 45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e >>>>> qemu-system-x86-2821 [003] ...1 45.848827: kvm_page_fault: address fe05b error_code 14 >>>>> >>>>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT) >>>>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is >>>>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE >>>>> paging and EPT enabled. However, there is a progress to switch from Legacy >>>>> mode's such-mode Protected mode to Long mode during system boot, the check >>>>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in >>>>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually >>>>> the original commit should just intended to prevent to dereference L2's CR3 >>>>> if the L1 hypervisor emulates L2's real mode through vm8086. >>>>> >>>>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and >>>>> !vm86_active. >>>>> >>>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> >>>>> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> >>>>> Cc: Ladi Prosek <lprosek@xxxxxxxxxx> >>>>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> >>>>> --- >>>>> arch/x86/kvm/vmx.c | 4 ++-- >>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>>> index c664365..2b2a05f 100644 >>>>> --- a/arch/x86/kvm/vmx.c >>>>> +++ b/arch/x86/kvm/vmx.c >>>>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val) >>>>> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept, >>>>> u32 *entry_failure_code) >>>>> { >>>>> - if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) { >>>>> + if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) { >>>>> if (!nested_cr3_valid(vcpu, cr3)) { >>>>> *entry_failure_code = ENTRY_FAIL_DEFAULT; >>>>> return 1; >>>>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne >>>>> * must not be dereferenced. >>>>> */ >>>>> if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) && >>>>> - !nested_ept) { >>>>> + !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) { >>>> >>>> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms >>>> as before 7ca29de21362. >>> >>> Hmm, I miss the function pdptrs_changed() will also dereference CR3. >>> How about something like this: >>> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> index c664365..d7ebf03 100644 >>> --- a/arch/x86/kvm/vmx.c >>> +++ b/arch/x86/kvm/vmx.c >>> @@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu >>> *vcpu, unsigned long val) >>> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long >>> cr3, bool nested_ept, >>> u32 *entry_failure_code) >>> { >>> - if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) { >>> + if (cr3 != kvm_read_cr3(vcpu) || >>> + (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) && >>> + pdptrs_changed(vcpu))) { >>> if (!nested_cr3_valid(vcpu, cr3)) { >>> *entry_failure_code = ENTRY_FAIL_DEFAULT; >>> return 1; >>> @@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu >>> *vcpu, unsigned long cr3, bool ne >>> * must not be dereferenced. >>> */ >>> if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) && >>> - !nested_ept) { >>> + !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) { >>> if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) { >>> *entry_failure_code = ENTRY_FAIL_PDPTE; >>> return 1; >> >> Still the same, Hyper-V is broken. The problem is not in real vs. >> protected mode. The way nested_ept_enabled is computed is incorrect. >> >> I can run both Hyper-V and KVM with EPT = 0 in L1 with this patch. Can >> you please give it a try? >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 98e82ee..9145c94 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -10121,7 +10121,7 @@ static int prepare_vmcs02(struct kvm_vcpu >> *vcpu, struct vmcs12 *vmcs12, >> vmcs12->guest_intr_status); >> } >> >> - nested_ept_enabled = (exec_control & >> SECONDARY_EXEC_ENABLE_EPT) != 0; >> + nested_ept_enabled = >> (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0; >> >> /* >> * Write an illegal value to APIC_ACCESS_ADDR. Later, > > You are right, it works. Please send out a formal patch and add the > kvm-unit-tests as Paolo mentioned. I do believe it would be more fair if _you_ send out the kvm-unit-tests patch. Paolo