Re: [PATCH] KVM: x86/tdp_mmu: Trigger the callback only when an interesting change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 27, 2024, Sean Christopherson wrote:
> On Fri, Sep 27, 2024, Sean Christopherson wrote:
> > On Thu, Sep 26, 2024, Yan Zhao wrote:
> > > On Thu, Sep 12, 2024 at 05:07:57PM -0700, Sean Christopherson wrote:
> > > > On Thu, Sep 12, 2024, Isaku Yamahata wrote:
> > > > Right now, the fixes for make_spte() are sitting toward the end of the massive
> > > > kvm_follow_pfn() rework (80+ patches and counting), but despite the size, I am
> > > > fairly confident that series can land in 6.13 (lots and lots of small patches).
> > > > 
> > > > ---
> > > > Author:     Sean Christopherson <seanjc@xxxxxxxxxx>
> > > > AuthorDate: Thu Sep 12 16:23:21 2024 -0700
> > > > Commit:     Sean Christopherson <seanjc@xxxxxxxxxx>
> > > > CommitDate: Thu Sep 12 16:35:06 2024 -0700
> > > > 
> > > >     KVM: x86/mmu: Flush TLBs if resolving a TDP MMU fault clears W or D bits
> > > >     
> > > >     Do a remote TLB flush if installing a leaf SPTE overwrites an existing
> > > >     leaf SPTE (with the same target pfn) and clears the Writable bit or the
> > > >     Dirty bit.  KVM isn't _supposed_ to clear Writable or Dirty bits in such
> > > >     a scenario, but make_spte() has a flaw where it will fail to set the Dirty
> > > >     if the existing SPTE is writable.
> > > >     
> > > >     E.g. if two vCPUs race to handle faults, the KVM will install a W=1,D=1
> > > >     SPTE for the first vCPU, and then overwrite it with a W=1,D=0 SPTE for the
> > > >     second vCPU.  If the first vCPU (or another vCPU) accesses memory using
> > > >     the W=1,D=1 SPTE, i.e. creates a writable, dirty TLB entry, and that is
> > > >     the only SPTE that is dirty at the time of the next relevant clearing of
> > > >     the dirty logs, then clear_dirty_gfn_range() will not modify any SPTEs
> > > >     because it sees the D=0 SPTE, and thus will complete the clearing of the
> > > >     dirty logs without performing a TLB flush.
> > > But it looks that kvm_flush_remote_tlbs_memslot() will always be invoked no
> > > matter clear_dirty_gfn_range() finds a D bit or not.
> > 
> > Oh, right, I forgot about that.  I'll tweak the changelog to call that out before
> > posting.  Hmm, and I'll drop the Cc: stable@ too, as commit b64d740ea7dd ("kvm:
> > x86: mmu: Always flush TLBs when enabling dirty logging") was a bug fix, i.e. if
> > anything should be backported it's that commit.
> 
> Actually, a better idea.  I think it makes sense to fully commit to not flushing
> when overwriting SPTEs, and instead rely on the dirty logging logic to do a remote
> TLB flush.

Oooh, but there's a bug.  KVM can tolerate/handle stale Dirty/Writable TLB entries
when dirty logging, but KVM cannot tolerate stale Writable TLB entries when write-
protecting for shadow paging.  The TDP MMU always flushes when clearing the MMU-
writable flag (modulo a bug that would cause KVM to make the SPTE !MMU-writable
in the page fault path), but the shadow MMU does not.

So I'm pretty sure we need the below, and then it may or may not make sense to have
a common "flush needed" helper (outside of the write-protecting flows, KVM probably
should WARN if MMU-writable is cleared).

---
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ce8323354d2d..7bd9c296f70e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -514,9 +514,12 @@ static u64 mmu_spte_update_no_track(u64 *sptep, u64 new_spte)
 /* Rules for using mmu_spte_update:
  * Update the state bits, it means the mapped pfn is not changed.
  *
- * Whenever an MMU-writable SPTE is overwritten with a read-only SPTE, remote
- * TLBs must be flushed. Otherwise rmap_write_protect will find a read-only
- * spte, even though the writable spte might be cached on a CPU's TLB.
+ * If the MMU-writable flag is cleared, i.e. the SPTE is write-protected for
+ * write-tracking, remote TLBs must be flushed, even if the SPTE was read-only,
+ * as KVM allows stale Writable TLB entries to exist.  When dirty logging, KVM
+ * flushes TLBs based on whether or not dirty bitmap/ring entries were reaped,
+ * not whether or not SPTEs were modified, i.e. only the write-protected case
+ * needs to precisely flush when modifying SPTEs.
  *
  * Returns true if the TLB needs to be flushed
  */
@@ -533,8 +536,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
         * we always atomically update it, see the comments in
         * spte_has_volatile_bits().
         */
-       if (is_mmu_writable_spte(old_spte) &&
-             !is_writable_pte(new_spte))
+       if (is_mmu_writable_spte(old_spte) && !is_mmu_writable_spte(new_spte))
                flush = true;
 
        /*
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 345c7115b632..aa1ca24d1168 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -133,12 +133,6 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
  */
 bool spte_has_volatile_bits(u64 spte)
 {
-       /*
-        * Always atomically update spte if it can be updated
-        * out of mmu-lock, it can ensure dirty bit is not lost,
-        * also, it can help us to get a stable is_writable_pte()
-        * to ensure tlb flush is not missed.
-        */
        if (!is_writable_pte(spte) && is_mmu_writable_spte(spte))
                return true;




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux