This also fixed an Accessed bit related bug in TDX, where "KVM_BUG_ON(was_present, kvm)" can be hit in below scenario: CPU 0 CPU 1 1. TD accepts GPA A 2. EPT violation in KVM 3. Prefault GPA A 4. Install an SPTE with Accessed bit unset 5. Install an SPTE with Accessed bit set 6. KVM_BUG_ON() in set_external_spte_present() Reviewed-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> On Wed, Dec 18, 2024 at 01:36:11PM -0800, Sean Christopherson wrote: > Treat slow-path TDP MMU faults as spurious if the access is allowed given > the existing SPTE to fix a benign warning (other than the WARN itself) > due to replacing a writable SPTE with a read-only SPTE, and to avoid the > unnecessary LOCK CMPXCHG and subsequent TLB flush. > > If a read fault races with a write fault, fast GUP fails for any reason > when trying to "promote" the read fault to a writable mapping, and KVM > resolves the write fault first, then KVM will end up trying to install a > read-only SPTE (for a !map_writable fault) overtop a writable SPTE. > > Note, it's not entirely clear why fast GUP fails, or if that's even how > KVM ends up with a !map_writable fault with a writable SPTE. If something > else is going awry, e.g. due to a bug in mmu_notifiers, then treating read > faults as spurious in this scenario could effectively mask the underlying > problem. > > However, retrying the faulting access instead of overwriting an existing > SPTE is functionally correct and desirable irrespective of the WARN, and > fast GUP _can_ legitimately fail with a writable VMA, e.g. if the Accessed > bit in primary MMU's PTE is toggled and causes a PTE value mismatch. The > WARN was also recently added, specifically to track down scenarios where > KVM is unnecessarily overwrites SPTEs, i.e. treating the fault as spurious > doesn't regress KVM's bug-finding capabilities in any way. In short, > letting the WARN linger because there's a tiny chance it's due to a bug > elsewhere would be excessively paranoid. > > Fixes: 1a175082b190 ("KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writable") > Reported-by: leiyang@xxxxxxxxxx > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219588 > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/mmu/mmu.c | 12 ------------ > arch/x86/kvm/mmu/spte.h | 17 +++++++++++++++++ > arch/x86/kvm/mmu/tdp_mmu.c | 5 +++++ > 3 files changed, 22 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 22e7ad235123..2401606db260 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3364,18 +3364,6 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, > return true; > } > > -static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte) > -{ > - if (fault->exec) > - return is_executable_pte(spte); > - > - if (fault->write) > - return is_writable_pte(spte); > - > - /* Fault was on Read access */ > - return spte & PT_PRESENT_MASK; > -} > - > /* > * Returns the last level spte pointer of the shadow page walk for the given > * gpa, and sets *spte to the spte value. This spte may be non-preset. If no > diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h > index f332b33bc817..af10bc0380a3 100644 > --- a/arch/x86/kvm/mmu/spte.h > +++ b/arch/x86/kvm/mmu/spte.h > @@ -461,6 +461,23 @@ static inline bool is_mmu_writable_spte(u64 spte) > return spte & shadow_mmu_writable_mask; > } > > +/* > + * Returns true if the access indicated by @fault is allowed by the existing > + * SPTE protections. Note, the caller is responsible for checking that the > + * SPTE is a shadow-present, leaf SPTE (either before or after). > + */ > +static inline bool is_access_allowed(struct kvm_page_fault *fault, u64 spte) > +{ > + if (fault->exec) > + return is_executable_pte(spte); > + > + if (fault->write) > + return is_writable_pte(spte); > + > + /* Fault was on Read access */ > + return spte & PT_PRESENT_MASK; > +} > + > /* > * If the MMU-writable flag is cleared, i.e. the SPTE is write-protected for > * write-tracking, remote TLBs must be flushed, even if the SPTE was read-only, > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 4508d868f1cd..2f15e0e33903 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -985,6 +985,11 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, > if (fault->prefetch && is_shadow_present_pte(iter->old_spte)) > return RET_PF_SPURIOUS; > > + if (is_shadow_present_pte(iter->old_spte) && > + is_access_allowed(fault, iter->old_spte) && > + is_last_spte(iter->old_spte, iter->level)) One nit: Do we need to warn on pfn_changed? > + return RET_PF_SPURIOUS; > + > if (unlikely(!fault->slot)) > new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); > else > > base-commit: 3522c419758ee8dca5a0e8753ee0070a22157bc1 > -- > 2.47.1.613.gc27f4b7a9f-goog > >