[PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When the TDP MMU is write-protection GFNs for page table protection (as
opposed to for dirty logging, or due to the HVA not being writable), it
checks if the SPTE is already write-protected and if so skips modifying
the SPTE and the TLB flush.

This behavior is incorrect because the SPTE may be write-protected for
dirty logging. This implies that the SPTE could be locklessly be made
writable on the next write access, and that vCPUs could still be running
with writable SPTEs cached in their TLB.

Fix this by unconditionally setting the SPTE and only skipping the TLB
flush if the SPTE was already marked !MMU-writable or !Host-writable,
which guarantees the SPTE cannot be locklessly be made writable and no
vCPUs are running the writable SPTEs cached in their TLBs.

Technically it would be safe to skip setting the SPTE as well since:

  (a) If MMU-writable is set then Host-writable must be cleared
      and the only way to set Host-writable is to fault the SPTE
      back in entirely (at which point any unsynced shadow pages
      reachable by the new SPTE will be synced and MMU-writable can
      be safetly be set again).

  and

  (b) MMU-writable is never consulted on its own.

And in fact this is what the shadow MMU does when write-protecting guest
page tables. However setting the SPTE unconditionally is much easier to
reason about and does not require a huge comment explaining why it is safe.

Fixes: 46044f72c382 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7b1bc816b7c3..462c6de9f944 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1423,14 +1423,16 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 /*
  * Removes write access on the last level SPTE mapping this GFN and unsets the
  * MMU-writable bit to ensure future writes continue to be intercepted.
- * Returns true if an SPTE was set and a TLB flush is needed.
+ *
+ * Returns true if a TLB flush is needed to ensure no CPU has a writable
+ * version of the SPTE in its TLB.
  */
 static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 			      gfn_t gfn, int min_level)
 {
 	struct tdp_iter iter;
 	u64 new_spte;
-	bool spte_set = false;
+	bool flush = false;
 
 	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
 
@@ -1442,19 +1444,30 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
 
-		if (!is_writable_pte(iter.old_spte))
-			break;
-
 		new_spte = iter.old_spte &
 			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
 
 		tdp_mmu_set_spte(kvm, &iter, new_spte);
-		spte_set = true;
+
+		/*
+		 * The TLB flush can be skipped if the old SPTE cannot be
+		 * locklessly be made writable, which implies it is already
+		 * write-protected due to being !MMU-writable or !Host-writable.
+		 * This guarantees no CPU currently has a writable version of
+		 * this SPTE in its TLB.
+		 *
+		 * Otherwise the old SPTE was either not write-protected or was
+		 * write-protected but for dirty logging (which does not flush
+		 * TLBs before dropping the MMU lock), so a TLB flush is
+		 * required.
+		 */
+		if (spte_can_locklessly_be_made_writable(iter.old_spte))
+			flush = true;
 	}
 
 	rcu_read_unlock();
 
-	return spte_set;
+	return flush;
 }
 
 /*

base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4
-- 
2.34.1.703.g22d0c6ccf7-goog




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux