Re: [PATCH] KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This also fixed an Accessed bit related bug in TDX, where
"KVM_BUG_ON(was_present, kvm)" can be hit in below scenario:


     CPU 0                 CPU 1
                           1. TD accepts GPA A
                           2. EPT violation in KVM
3. Prefault GPA A
4. Install an SPTE with
   Accessed bit unset
                           5. Install an SPTE with
                              Accessed bit set
                           6. KVM_BUG_ON() in set_external_spte_present()


Reviewed-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>

On Wed, Dec 18, 2024 at 01:36:11PM -0800, Sean Christopherson wrote:
> Treat slow-path TDP MMU faults as spurious if the access is allowed given
> the existing SPTE to fix a benign warning (other than the WARN itself)
> due to replacing a writable SPTE with a read-only SPTE, and to avoid the
> unnecessary LOCK CMPXCHG and subsequent TLB flush.
> 
> If a read fault races with a write fault, fast GUP fails for any reason
> when trying to "promote" the read fault to a writable mapping, and KVM
> resolves the write fault first, then KVM will end up trying to install a
> read-only SPTE (for a !map_writable fault) overtop a writable SPTE.
> 
> Note, it's not entirely clear why fast GUP fails, or if that's even how
> KVM ends up with a !map_writable fault with a writable SPTE.  If something
> else is going awry, e.g. due to a bug in mmu_notifiers, then treating read
> faults as spurious in this scenario could effectively mask the underlying
> problem.
> 
> However, retrying the faulting access instead of overwriting an existing
> SPTE is functionally correct and desirable irrespective of the WARN, and
> fast GUP _can_ legitimately fail with a writable VMA, e.g. if the Accessed
> bit in primary MMU's PTE is toggled and causes a PTE value mismatch.  The
> WARN was also recently added, specifically to track down scenarios where
> KVM is unnecessarily overwrites SPTEs, i.e. treating the fault as spurious
> doesn't regress KVM's bug-finding capabilities in any way.  In short,
> letting the WARN linger because there's a tiny chance it's due to a bug
> elsewhere would be excessively paranoid.
> 
> Fixes: 1a175082b190 ("KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writable")
> Reported-by: leiyang@xxxxxxxxxx
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219588
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
>  arch/x86/kvm/mmu/mmu.c     | 12 ------------
>  arch/x86/kvm/mmu/spte.h    | 17 +++++++++++++++++
>  arch/x86/kvm/mmu/tdp_mmu.c |  5 +++++
>  3 files changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 22e7ad235123..2401606db260 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3364,18 +3364,6 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> -static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
> -{
> -	if (fault->exec)
> -		return is_executable_pte(spte);
> -
> -	if (fault->write)
> -		return is_writable_pte(spte);
> -
> -	/* Fault was on Read access */
> -	return spte & PT_PRESENT_MASK;
> -}
> -
>  /*
>   * Returns the last level spte pointer of the shadow page walk for the given
>   * gpa, and sets *spte to the spte value. This spte may be non-preset. If no
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index f332b33bc817..af10bc0380a3 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -461,6 +461,23 @@ static inline bool is_mmu_writable_spte(u64 spte)
>  	return spte & shadow_mmu_writable_mask;
>  }
>  
> +/*
> + * Returns true if the access indicated by @fault is allowed by the existing
> + * SPTE protections.  Note, the caller is responsible for checking that the
> + * SPTE is a shadow-present, leaf SPTE (either before or after).
> + */
> +static inline bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
> +{
> +	if (fault->exec)
> +		return is_executable_pte(spte);
> +
> +	if (fault->write)
> +		return is_writable_pte(spte);
> +
> +	/* Fault was on Read access */
> +	return spte & PT_PRESENT_MASK;
> +}
> +
>  /*
>   * If the MMU-writable flag is cleared, i.e. the SPTE is write-protected for
>   * write-tracking, remote TLBs must be flushed, even if the SPTE was read-only,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 4508d868f1cd..2f15e0e33903 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -985,6 +985,11 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
>  	if (fault->prefetch && is_shadow_present_pte(iter->old_spte))
>  		return RET_PF_SPURIOUS;
>  
> +	if (is_shadow_present_pte(iter->old_spte) &&
> +	    is_access_allowed(fault, iter->old_spte) &&
> +	    is_last_spte(iter->old_spte, iter->level))
One nit:
Do we need to warn on pfn_changed?

> +		return RET_PF_SPURIOUS;
> +
>  	if (unlikely(!fault->slot))
>  		new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
>  	else
> 
> base-commit: 3522c419758ee8dca5a0e8753ee0070a22157bc1
> -- 
> 2.47.1.613.gc27f4b7a9f-goog
> 
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux