Re: [PATCH 1/2] KVM: x86: Do not re-{try,execute} after failed emulation in L2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 23, 2018 at 11:25:38AM +0200, Paolo Bonzini wrote:
> On 22/08/2018 23:52, Sean Christopherson wrote:
> > This fixes an issue where an "MMIO" emulation failure[1] in L2 is all
> > but guaranteed to result in an infinite loop when TDP is enabled.
> > Because "cr2" is actually an L2 GPA when TDP is enabled, calling
> > kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped
> > case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA
> > and cause reexecute_instruction() to immediately return true.
> 
> The only way to get here would be if vcpu->arch.mmu.page_fault
> returned RET_PF_EMULATE while in guest mode.  Should it be
> kvm_mmu_page_fault instead that does:
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index af38644cf042..c9b734be685a 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -5261,7 +5261,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
>  		return 1;
>  	}
>  
> -	if (mmio_info_in_cache(vcpu, cr2, direct))
> +	/*
> +	 * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we
> +	 * can still optimistically try to just unprotect the page and let
> +	 * the processor execute the instruction that caused the page fault.
> +	 * Of course that's pointless if we know it's MMIO, but it also does
> +	 * not work when running a nested guest, because...
> +	 */
> +	if (mmio_info_in_cache(vcpu, cr2, direct) || is_guest_mode(vcpu))
>  		emulation_type = EMULTYPE_NO_REEXECUTE;
>  emulate:
>  	/*
> 
> So I guess the issue can arise only if L0 is using shadow page tables?

Yes, and I think only if L0 is also using EPT as that's the case where
cr2 is actually a GPA and thus kvm_mmu_gva_to_gpa_write() fails.

> With nested EPT, mmu_is_nested(vcpu) should be true, and therefore
> mmio_info_in_cache should return false.  That also deserves a mention
> in the commit message, if it's indeed the case.

Ha, I had that at some point but removed it because I thought that
it'd be redundant/obvious information.
 
> By the way, if we follow the suggestion in my review of patch 2,
> we also should put that change first, and initialize emulation_type to
> zero in kvm_mmu_page_fault at the same time, because the sole case that
> needs EMULTYPE_RETRY is
> 
> -	int r, emulation_type = EMULTYPE_RETRY;
> +	int r, emulation_type = 0;
> ...
> -	if (mmio_info_in_cache(vcpu, cr2, direct))
> +	/*
> +	 * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we
> +	 * can still optimistically try to just unprotect the page and let
> +	 * the processor execute the instruction that caused the page fault.
> +	 * Of course that's pointless if we know it's MMIO.
> +	 */
> +	if (!mmio_info_in_cache(vcpu, cr2, direct)
>  		emulation_type = EMULTYPE_RETRY;
> 
> which then becomes in this patch.
> 
> 	if (!mmio_info_in_cache(vcpu, cr2, direct) && !is_guest_mode(vcpu))
>  		emulation_type = EMULTYPE_RETRY;

I considered this approach, but ultimately re{try,execute}_instruction
simply do not handle is_guest_mode() with EPT.  Part of the reason I
dug deep into the history was to try and understand whether or not we
can ever legitimately retry an L2 instruction, EPT aside.  I think the
answer is no, so I put the check in re{try,execute}_instruction().

That being said, I agree that adding the check in kvm_mmu_page_fault()
makes the desired behavior more obvious.  And I also think removing
EMULTYPE_NO_REEXECUTE (or at least making re-execute opt-in) is a very
good thing for robustness and readability.

So, what about adding !is_guest_mode() to kvm_mmu_page_fault(), and
also add a WARN_ON_ONCE(is_guest_mode()) check in the retry functions?
 
> > [1] This issue was encountered after commit 3a2936dedd20 ("kvm: mmu:
> >     Don't expose private memslots to L2") changed the page fault path
> >     to return KVM_PFN_NOSLOT when translating an L2 access to a
> >     prive memslot.  Returning KVM_PFN_NOSLOT is semantically correct
> >     when we want to hide a memslot from L2, i.e. there effectively is
> >     no defined memory region for L2, but it has the unfortunate side
> >     effect of making KVM think the GFN is a MMIO page, thus triggering
> >     emulation.  The failure occurred with in-development code that
> >     deliberately exposed a private memslot to L2, which L2 accessed
> >     with an instruction that is not emulated by KVM.
> 
> Can you write a testcase for tools/testing/selftests/kvm, or alternatively
> for kvm-unit-tests?  It's okay if it only fails for shadow page tables in L0.
> But the priority for me is to understand the bugfix and refactor the EMULTYPE
> mess.

Will do.
 
> Thanks,
> 
> Paolo



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux