Re: [PATCH 03/22] KVM: x86/mmu: Trigger unprotect logic only on write-protection page faults

Yuan Yao <yuan.yao@xxxxxxxxxxxxxxx> · Thu, 15 Aug 2024 16:30:39 +0800

On Wed, Aug 14, 2024 at 07:21:06AM -0700, Sean Christopherson wrote:
> On Wed, Aug 14, 2024, Yuan Yao wrote:
> > > @@ -5960,6 +5961,41 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> > >  	write_unlock(&vcpu->kvm->mmu_lock);
> > >  }
> > >
> > > +static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > > +				       u64 error_code, int *emulation_type)
> > > +{
> > > +	bool direct = vcpu->arch.mmu->root_role.direct;
> > > +
> > > +	/*
> > > +	 * Before emulating the instruction, check if the error code
> > > +	 * was due to a RO violation while translating the guest page.
> > > +	 * This can occur when using nested virtualization with nested
> > > +	 * paging in both guests. If true, we simply unprotect the page
> > > +	 * and resume the guest.
> > > +	 */
> > > +	if (direct &&
> > > +	    (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
> > > +		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
> > > +		return RET_PF_FIXED;
> > > +	}
> > > +
> > > +	/*
> > > +	 * The gfn is write-protected, but if emulation fails we can still
> > > +	 * optimistically try to just unprotect the page and let the processor
> > > +	 * re-execute the instruction that caused the page fault.  Do not allow
> > > +	 * retrying MMIO emulation, as it's not only pointless but could also
> > > +	 * cause us to enter an infinite loop because the processor will keep
> > > +	 * faulting on the non-existent MMIO address.  Retrying an instruction
> > > +	 * from a nested guest is also pointless and dangerous as we are only
> > > +	 * explicitly shadowing L1's page tables, i.e. unprotecting something
> > > +	 * for L1 isn't going to magically fix whatever issue cause L2 to fail.
> > > +	 */
> > > +	if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
> >
> > Looks the mmio_info_in_cache() checking can be removed,
> > emulation should not come here with RET_PF_WRITE_PROTECTED
> > introduced, may WARN_ON_ONCE() it.
>
> Yeah, that was my instinct as well.  I kept it mostly because I liked having the
> comment, but also because I was thinking the cache could theoretically get a hit.
> But that's not true.  KVM should return RET_PF_WRITE_PROTECTED if and only if
> there is a memslot, and creating a memslot is supposed to invalidate the MMIO
> cache by virtue of changing the memslot generation.
>
> Unless someone feels strongly that the mmio_info_in_cache() call should be
> deleted entirely, I'll tack on this patch.  The cache lookup is cheap, and IMO
> it's helpful to explicitly document that it should be impossible to reach this
> point with what appears to be MMIO.
>
> ---
>  arch/x86/kvm/mmu/mmu.c | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 50695eb2ee22..7f3f57237f23 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5997,6 +5997,18 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  	vcpu->arch.last_retry_eip = 0;
>  	vcpu->arch.last_retry_addr = 0;
>
> +	/*
> +	 * It should be impossible to reach this point with an MMIO cache hit,
> +	 * as RET_PF_WRITE_PROTECTED is returned if and only if there's a valid,
> +	 * writable memslot, and creating a memslot should invalidate the MMIO
> +	 * cache by way of changing the memslot generation.  WARN and disallow
> +	 * retry if MMIO is detect, as retrying MMIO emulation is pointless and
> +	 * could put the vCPU into an infinite loop because the processor will
> +	 * keep faulting on the non-existent MMIO address.
> +	 */
> +	if (WARN_ON_ONCE(mmio_info_in_cache(vcpu, cr2_or_gpa, direct)))
> +		return RET_PF_EMULATE;
> +

LGTM.

Reviewed-by: Yuan Yao <yuan.yao@xxxxxxxxx>

>  	/*
>  	 * Before emulating the instruction, check to see if the access may be
>  	 * due to L1 accessing nested NPT/EPT entries used for L2, i.e. if the
> @@ -6029,17 +6041,15 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  		return RET_PF_FIXED;
>
>  	/*
> -	 * The gfn is write-protected, but if emulation fails we can still
> -	 * optimistically try to just unprotect the page and let the processor
> +	 * The gfn is write-protected, but if KVM detects its emulating an
> +	 * instruction that is unlikely to be used to modify page tables, or if
> +	 * emulation fails, KVM can try to unprotect the gfn and let the CPU
>  	 * re-execute the instruction that caused the page fault.  Do not allow
> -	 * retrying MMIO emulation, as it's not only pointless but could also
> -	 * cause us to enter an infinite loop because the processor will keep
> -	 * faulting on the non-existent MMIO address.  Retrying an instruction
> -	 * from a nested guest is also pointless and dangerous as we are only
> -	 * explicitly shadowing L1's page tables, i.e. unprotecting something
> -	 * for L1 isn't going to magically fix whatever issue cause L2 to fail.
> +	 * retrying an instruction from a nested guest as KVM is only explicitly
> +	 * shadowing L1's page tables, i.e. unprotecting something for L1 isn't
> +	 * going to magically fix whatever issue cause L2 to fail.
>  	 */
> -	if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
> +	if (!is_guest_mode(vcpu))
>  		*emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
>
>  	return RET_PF_EMULATE;
>
> base-commit: 7d33880356496eb0640c6c825cc60898063c4902
> --