Re: [PATCH v2] kvm: x86: mmu: Always flush TLBs when enabling dirty logging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 28, 2022, Junaid Shahid wrote:
>  	/*
> +	 * The caller will flush the TLBs after this function returns.
> +	 *

This comment is still stale, e.g. it contains a blurb that talks about skipping
the flush based on MMU-writable.

	 * So to determine if a TLB flush is truly required, KVM
	 * will clear a separate software-only bit (MMU-writable) and skip the
	 * flush if-and-only-if this bit was already clear.

My preference is to drop this comment entirely and fold it into a single mega
comment in kvm_mmu_slot_apply_flags().  More below.

>  	 * It's also safe to flush TLBs out of mmu lock here as currently this
>  	 * function is only used for dirty logging, in which case flushing TLB
>  	 * out of mmu lock also guarantees no dirty pages will be lost in
>  	 * dirty_bitmap.
>  	 */
> -	if (flush)
> -		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
>  }

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f389691d8c04..f8b215405fe3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12448,6 +12448,25 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>  		} else {
>  			kvm_mmu_slot_remove_write_access(kvm, new, PG_LEVEL_4K);
>  		}
> +
> +		/*
> +		 * We need to flush the TLBs in either of the following cases:

Please avoid "we" and pronouns in general.  It's fairly obvious that "we" refers
to KVM in this case, but oftentimes pronouns can be ambiguous, e.g. "we" can refer
to the developer, userspace, KVM, etc...

Smushing the two comments together, how about this as fixup?

---
 arch/x86/kvm/mmu/mmu.c | 23 ------------------
 arch/x86/kvm/x86.c     | 55 ++++++++++++++++++++++++++++++------------
 2 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 14d543f8373c..749c2d39c7bc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6097,29 +6097,6 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 		kvm_tdp_mmu_wrprot_slot(kvm, memslot, start_level);
 		read_unlock(&kvm->mmu_lock);
 	}
-
-	/*
-	 * The caller will flush TLBs to ensure that guest writes are reflected
-	 * in the dirty bitmap before the memslot update completes, i.e. before
-	 * enabling dirty logging is visible to userspace.
-	 *
-	 * Perform the TLB flush outside the mmu_lock to reduce the amount of
-	 * time the lock is held. However, this does mean that another CPU can
-	 * now grab mmu_lock and encounter a write-protected SPTE while CPUs
-	 * still have a writable mapping for the associated GFN in their TLB.
-	 *
-	 * This is safe but requires KVM to be careful when making decisions
-	 * based on the write-protection status of an SPTE. Specifically, KVM
-	 * also write-protects SPTEs to monitor changes to guest page tables
-	 * during shadow paging, and must guarantee no CPUs can write to those
-	 * page before the lock is dropped. As mentioned in the previous
-	 * paragraph, a write-protected SPTE is no guarantee that CPU cannot
-	 * perform writes. So to determine if a TLB flush is truly required, KVM
-	 * will clear a separate software-only bit (MMU-writable) and skip the
-	 * flush if-and-only-if this bit was already clear.
-	 *
-	 * See is_writable_pte() for more details.
-	 */
 }

 static inline bool need_topup(struct kvm_mmu_memory_cache *cache, int min)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7a5e0be2c8ef..430ca4d304a7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12474,21 +12474,46 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 		}

 		/*
-		 * We need to flush the TLBs in either of the following cases:
-		 *
-		 * 1. We had to clear the Dirty bits for some SPTEs
-		 * 2. We had to write-protect some SPTEs and any of those SPTEs
-		 *    had the MMU-Writable bit set, regardless of whether the
-		 *    actual hardware Writable bit was set. This is because as
-		 *    long as the SPTE is MMU-Writable, some CPU may still have
-		 *    writable TLB entries for it, even after the Writable bit
-		 *    has been cleared. For more details, see the comments for
-		 *    is_writable_pte() [specifically the case involving
-		 *    access-tracking SPTEs].
-		 *
-		 * In almost all cases, one of the above conditions will be true.
-		 * So it is simpler (and probably slightly more efficient) to
-		 * just flush the TLBs unconditionally.
+		 * Unconditionally flush the TLBs after enabling dirty logging.
+		 * A flush is almost always going to be necessary (see below),
+		 * and unconditionally flushing allows the helpers to omit
+		 * the subtly complex checks when removing write access.
+		 *
+		 * Do the flush outside of mmu_lock to reduce the amount of
+		 * time mmu_lock is held.  Flushing after dropping mmu_lock is
+		 * safe as KVM only needs to guarantee the slot is fully
+		 * write-protected before returning to userspace, i.e. before
+		 * userspace can consume the dirty status.
+		 *
+		 * Flushing outside of mmu_lock requires KVM to be careful when
+		 * making decisions based on writable status of an SPTE, e.g. a
+		 * !writable SPTE doesn't guarantee a CPU can't perform writes.
+		 *
+		 * Specifically, KVM also write-protects guest page tables to
+		 * monitor changes when using shadow paging, and must guarantee
+		 * no CPUs can write to those page before mmu_lock is dropped.
+		 * Because CPUs may have stale TLB entries at this point, a
+		 * !writable SPTE doesn't guarantee CPUs can't perform writes.
+		 *
+		 * KVM also allows making SPTES writable outside of mmu_lock,
+		 * e.g. to allow dirty logging without taking mmu_lock.
+		 *
+		 * To handle these scenarios, KVM uses a separate software-only
+		 * bit (MMU-writable) to track if a SPTE is !writable due to
+		 * a guest page table being write-protected (KVM clears the
+		 * MMU-writable flag when write-protecting for shadow paging).
+		 *
+		 * The use of MMU-writable is also the primary motivation for
+		 * the unconditional flush.  Because KVM must guarantee that a
+		 * CPU doesn't contain stale, writable TLB entries for a
+		 * !MMU-writable SPTE, KVM must flush if it encounters any
+		 * MMU-writable SPTE regardless of whether the actual hardware
+		 * writable bit was set.  I.e. KVM is almost guaranteed to need
+		 * to flush, while unconditionally flushing allows the "remove
+		 * write access" helpers to ignore MMU-writable entirely.
+		 *
+		 * See is_writable_pte() for more details (the case involving
+		 * access-tracked SPTEs is particularly relevant).
 		 */
 		kvm_arch_flush_remote_tlbs_memslot(kvm, new);
 	}

base-commit: c00bb4ce5a8aa2156b31ac6b18285e52e1762d21
--




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux