Re: [PATCH v2 4/4] KVM: x86/mmu: Fix a *very* theoretical race in kvm_mmu_track_write()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/3/24 01:23, Sean Christopherson wrote:
Add full memory barriers in kvm_mmu_track_write() and account_shadowed()
to plug a (very, very theoretical) race where kvm_mmu_track_write() could
miss a 0->1 transition of indirect_shadow_pages and fail to zap relevant,
*stale* SPTEs.

Ok, so we have

emulator_write_phys
  overwrite PTE
  kvm_page_track_write
    kvm_mmu_track_write
      // memory barrier missing here
      if (indirect_shadow_pages)
        zap();

and on the other side

  FNAME(page_fault)
    FNAME(fetch)
      kvm_mmu_get_child_sp
        kvm_mmu_get_shadow_page
          __kvm_mmu_get_shadow_page
            kvm_mmu_alloc_shadow_page
              account_shadowed
                indirect shadow pages++
                // memory barrier missing here
      if (FNAME(gpte_changed)) // reads PTE
        goto out

If you can weave something like this in the commit message the sequence would be a bit clearer.

In practice, this bug is likely benign as both the 0=>1 transition and
reordering of this scope are extremely rare occurrences.

I wouldn't call it benign, it's more that it's unobservable in practice but the race is real. However...
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3c193b096b45..86b85060534d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -830,6 +830,14 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
  	struct kvm_memory_slot *slot;
  	gfn_t gfn;
+ /*
+	 * Ensure indirect_shadow_pages is elevated prior to re-reading guest
+	 * child PTEs in FNAME(gpte_changed), i.e. guarantee either in-flight
+	 * emulated writes are visible before re-reading guest PTEs, or that
+	 * an emulated write will see the elevated count and acquire mmu_lock
+	 * to update SPTEs.  Pairs with the smp_mb() in kvm_mmu_track_write().
+	 */
+	smp_mb();

... this memory barrier needs to be after the increment (the desired ordering is store-before-read).

Paolo

  	kvm->arch.indirect_shadow_pages++;
  	gfn = sp->gfn;
  	slots = kvm_memslots_for_spte_role(kvm, sp->role);
@@ -5747,10 +5755,15 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
  	bool flush = false;
/*
-	 * If we don't have indirect shadow pages, it means no page is
-	 * write-protected, so we can exit simply.
+	 * When emulating guest writes, ensure the written value is visible to
+	 * any task that is handling page faults before checking whether or not
+	 * KVM is shadowing a guest PTE.  This ensures either KVM will create
+	 * the correct SPTE in the page fault handler, or this task will see
+	 * a non-zero indirect_shadow_pages.  Pairs with the smp_mb() in
+	 * account_shadowed().
  	 */
-	if (!READ_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
+	smp_mb();
+	if (!vcpu->kvm->arch.indirect_shadow_pages)
  		return;
write_lock(&vcpu->kvm->mmu_lock);





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux