On 22/08/2018 23:52, Sean Christopherson wrote: > This fixes an issue where an "MMIO" emulation failure[1] in L2 is all > but guaranteed to result in an infinite loop when TDP is enabled. > Because "cr2" is actually an L2 GPA when TDP is enabled, calling > kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped > case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA > and cause reexecute_instruction() to immediately return true. The only way to get here would be if vcpu->arch.mmu.page_fault returned RET_PF_EMULATE while in guest mode. Should it be kvm_mmu_page_fault instead that does: diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index af38644cf042..c9b734be685a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5261,7 +5261,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, return 1; } - if (mmio_info_in_cache(vcpu, cr2, direct)) + /* + * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we + * can still optimistically try to just unprotect the page and let + * the processor execute the instruction that caused the page fault. + * Of course that's pointless if we know it's MMIO, but it also does + * not work when running a nested guest, because... + */ + if (mmio_info_in_cache(vcpu, cr2, direct) || is_guest_mode(vcpu)) emulation_type = EMULTYPE_NO_REEXECUTE; emulate: /* So I guess the issue can arise only if L0 is using shadow page tables? With nested EPT, mmu_is_nested(vcpu) should be true, and therefore mmio_info_in_cache should return false. That also deserves a mention in the commit message, if it's indeed the case. By the way, if we follow the suggestion in my review of patch 2, we also should put that change first, and initialize emulation_type to zero in kvm_mmu_page_fault at the same time, because the sole case that needs EMULTYPE_RETRY is - int r, emulation_type = EMULTYPE_RETRY; + int r, emulation_type = 0; ... - if (mmio_info_in_cache(vcpu, cr2, direct)) + /* + * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we + * can still optimistically try to just unprotect the page and let + * the processor execute the instruction that caused the page fault. + * Of course that's pointless if we know it's MMIO. + */ + if (!mmio_info_in_cache(vcpu, cr2, direct) emulation_type = EMULTYPE_RETRY; which then becomes in this patch. if (!mmio_info_in_cache(vcpu, cr2, direct) && !is_guest_mode(vcpu)) emulation_type = EMULTYPE_RETRY; > [1] This issue was encountered after commit 3a2936dedd20 ("kvm: mmu: > Don't expose private memslots to L2") changed the page fault path > to return KVM_PFN_NOSLOT when translating an L2 access to a > prive memslot. Returning KVM_PFN_NOSLOT is semantically correct > when we want to hide a memslot from L2, i.e. there effectively is > no defined memory region for L2, but it has the unfortunate side > effect of making KVM think the GFN is a MMIO page, thus triggering > emulation. The failure occurred with in-development code that > deliberately exposed a private memslot to L2, which L2 accessed > with an instruction that is not emulated by KVM. Can you write a testcase for tools/testing/selftests/kvm, or alternatively for kvm-unit-tests? It's okay if it only fails for shadow page tables in L0. But the priority for me is to understand the bugfix and refactor the EMULTYPE mess. Thanks, Paolo