On Thu, Aug 23, 2018 at 11:25:38AM +0200, Paolo Bonzini wrote: > On 22/08/2018 23:52, Sean Christopherson wrote: > > This fixes an issue where an "MMIO" emulation failure[1] in L2 is all > > but guaranteed to result in an infinite loop when TDP is enabled. > > Because "cr2" is actually an L2 GPA when TDP is enabled, calling > > kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped > > case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA > > and cause reexecute_instruction() to immediately return true. > > The only way to get here would be if vcpu->arch.mmu.page_fault > returned RET_PF_EMULATE while in guest mode. Should it be > kvm_mmu_page_fault instead that does: > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index af38644cf042..c9b734be685a 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -5261,7 +5261,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, > return 1; > } > > - if (mmio_info_in_cache(vcpu, cr2, direct)) > + /* > + * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we > + * can still optimistically try to just unprotect the page and let > + * the processor execute the instruction that caused the page fault. > + * Of course that's pointless if we know it's MMIO, but it also does > + * not work when running a nested guest, because... > + */ > + if (mmio_info_in_cache(vcpu, cr2, direct) || is_guest_mode(vcpu)) > emulation_type = EMULTYPE_NO_REEXECUTE; > emulate: > /* > > So I guess the issue can arise only if L0 is using shadow page tables? Yes, and I think only if L0 is also using EPT as that's the case where cr2 is actually a GPA and thus kvm_mmu_gva_to_gpa_write() fails. > With nested EPT, mmu_is_nested(vcpu) should be true, and therefore > mmio_info_in_cache should return false. That also deserves a mention > in the commit message, if it's indeed the case. Ha, I had that at some point but removed it because I thought that it'd be redundant/obvious information. > By the way, if we follow the suggestion in my review of patch 2, > we also should put that change first, and initialize emulation_type to > zero in kvm_mmu_page_fault at the same time, because the sole case that > needs EMULTYPE_RETRY is > > - int r, emulation_type = EMULTYPE_RETRY; > + int r, emulation_type = 0; > ... > - if (mmio_info_in_cache(vcpu, cr2, direct)) > + /* > + * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we > + * can still optimistically try to just unprotect the page and let > + * the processor execute the instruction that caused the page fault. > + * Of course that's pointless if we know it's MMIO. > + */ > + if (!mmio_info_in_cache(vcpu, cr2, direct) > emulation_type = EMULTYPE_RETRY; > > which then becomes in this patch. > > if (!mmio_info_in_cache(vcpu, cr2, direct) && !is_guest_mode(vcpu)) > emulation_type = EMULTYPE_RETRY; I considered this approach, but ultimately re{try,execute}_instruction simply do not handle is_guest_mode() with EPT. Part of the reason I dug deep into the history was to try and understand whether or not we can ever legitimately retry an L2 instruction, EPT aside. I think the answer is no, so I put the check in re{try,execute}_instruction(). That being said, I agree that adding the check in kvm_mmu_page_fault() makes the desired behavior more obvious. And I also think removing EMULTYPE_NO_REEXECUTE (or at least making re-execute opt-in) is a very good thing for robustness and readability. So, what about adding !is_guest_mode() to kvm_mmu_page_fault(), and also add a WARN_ON_ONCE(is_guest_mode()) check in the retry functions? > > [1] This issue was encountered after commit 3a2936dedd20 ("kvm: mmu: > > Don't expose private memslots to L2") changed the page fault path > > to return KVM_PFN_NOSLOT when translating an L2 access to a > > prive memslot. Returning KVM_PFN_NOSLOT is semantically correct > > when we want to hide a memslot from L2, i.e. there effectively is > > no defined memory region for L2, but it has the unfortunate side > > effect of making KVM think the GFN is a MMIO page, thus triggering > > emulation. The failure occurred with in-development code that > > deliberately exposed a private memslot to L2, which L2 accessed > > with an instruction that is not emulated by KVM. > > Can you write a testcase for tools/testing/selftests/kvm, or alternatively > for kvm-unit-tests? It's okay if it only fails for shadow page tables in L0. > But the priority for me is to understand the bugfix and refactor the EMULTYPE > mess. Will do. > Thanks, > > Paolo