2017-03-22 20:00 GMT+08:00 Ladi Prosek <lprosek@xxxxxxxxxx>: > On Sat, Mar 18, 2017 at 7:37 AM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: >> 2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@xxxxxxxxxx>: >>> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: >>>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> >>>> >>>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that >>>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly: >>>> >>>> qemu-system-x86-2821 [003] d..2 45.848814: kvm_entry: vcpu 0 >>>> qemu-system-x86-2821 [003] ...1 45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e >>>> qemu-system-x86-2821 [003] ...1 45.848827: kvm_page_fault: address fe05b error_code 14 >>>> >>>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT) >>>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is >>>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE >>>> paging and EPT enabled. However, there is a progress to switch from Legacy >>>> mode's such-mode Protected mode to Long mode during system boot, the check >>>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in >>>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually >>>> the original commit should just intended to prevent to dereference L2's CR3 >>>> if the L1 hypervisor emulates L2's real mode through vm8086. >>>> >>>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and >>>> !vm86_active. >>>> >>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> >>>> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> >>>> Cc: Ladi Prosek <lprosek@xxxxxxxxxx> >>>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> >>>> --- >>>> arch/x86/kvm/vmx.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>> index c664365..2b2a05f 100644 >>>> --- a/arch/x86/kvm/vmx.c >>>> +++ b/arch/x86/kvm/vmx.c >>>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val) >>>> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept, >>>> u32 *entry_failure_code) >>>> { >>>> - if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) { >>>> + if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) { >>>> if (!nested_cr3_valid(vcpu, cr3)) { >>>> *entry_failure_code = ENTRY_FAIL_DEFAULT; >>>> return 1; >>>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne >>>> * must not be dereferenced. >>>> */ >>>> if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) && >>>> - !nested_ept) { >>>> + !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) { >>> >>> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms >>> as before 7ca29de21362. >> >> Hmm, I miss the function pdptrs_changed() will also dereference CR3. >> How about something like this: >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index c664365..d7ebf03 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu >> *vcpu, unsigned long val) >> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long >> cr3, bool nested_ept, >> u32 *entry_failure_code) >> { >> - if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) { >> + if (cr3 != kvm_read_cr3(vcpu) || >> + (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) && >> + pdptrs_changed(vcpu))) { >> if (!nested_cr3_valid(vcpu, cr3)) { >> *entry_failure_code = ENTRY_FAIL_DEFAULT; >> return 1; >> @@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu >> *vcpu, unsigned long cr3, bool ne >> * must not be dereferenced. >> */ >> if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) && >> - !nested_ept) { >> + !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) { >> if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) { >> *entry_failure_code = ENTRY_FAIL_PDPTE; >> return 1; > > Still the same, Hyper-V is broken. The problem is not in real vs. > protected mode. The way nested_ept_enabled is computed is incorrect. > > I can run both Hyper-V and KVM with EPT = 0 in L1 with this patch. Can > you please give it a try? > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 98e82ee..9145c94 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -10121,7 +10121,7 @@ static int prepare_vmcs02(struct kvm_vcpu > *vcpu, struct vmcs12 *vmcs12, > vmcs12->guest_intr_status); > } > > - nested_ept_enabled = (exec_control & > SECONDARY_EXEC_ENABLE_EPT) != 0; > + nested_ept_enabled = > (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0; > > /* > * Write an illegal value to APIC_ACCESS_ADDR. Later, You are right, it works. Please send out a formal patch and add the kvm-unit-tests as Paolo mentioned. Regards, Wanpeng Li