Re: [PATCH v14 05/13] x86/mm: use INVLPGB in flush_tlb_all

Borislav Petkov <bp@xxxxxxxxx> · Sat, 1 Mar 2025 13:20:22 +0100

On Fri, Feb 28, 2025 at 11:18:04AM -0800, Dave Hansen wrote:
> We haven't talked at all about the locking rules for
> invlpgb_flush_all(). It was used once in this series without any
> explicit preempt twiddling. I assume that was because it was used in a
> path where preempt is disabled.
> 
> If it does need a universal rule about preempt, can we please add an:
> 
> 	lockdep_assert_preemption_disabled()
> 
> along with a comment about why it needs preempt disabled?

So, after talking on IRC last night, below is what I think we should do ontop.

More specifically:

- I've pushed the preemption guard inside the functions which do
  INVLPGB+TLBSYNC so that callers do not have to think about it.

- invlpgb_kernel_range_flush() I still don't like and we have to rely there on
  cant_migrate() in __tlbsync() - I'd like for all of them to be nicely packed
  but don't have an idea yet how to do that cleanly...

- document what means for bits rax[0:2] being clear when issuing INVLPGB


That ok?

Anything I've missed?

If not, I'll integrate this into the patches.

Thx.

diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 45d9c7687d61..0d90ceeb472b 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -39,6 +39,10 @@ static inline void invlpg(unsigned long addr)
  * the first page, while __invlpgb gets the more human readable number of
  * pages to invalidate.
  *
+ * The bits in rax[0:2] determine respectively which components of the address
+ * (VA, PCID, ASID) get compared when flushing. If neither bits are set, *any*
+ * address in the specified range matches.
+ *
  * TLBSYNC is used to ensure that pending INVLPGB invalidations initiated from
  * this CPU have completed.
  */
@@ -60,10 +64,10 @@ static inline void __invlpgb(unsigned long asid, unsigned long pcid,
 static inline void __tlbsync(void)
 {
 	/*
-	 * tlbsync waits for invlpgb instructions originating on the
-	 * same CPU to have completed. Print a warning if we could have
-	 * migrated, and might not be waiting on all the invlpgbs issued
-	 * during this TLB invalidation sequence.
+	 * TLBSYNC waits for INVLPGB instructions originating on the same CPU
+	 * to have completed. Print a warning if the task has been migrated,
+	 * and might not be waiting on all the INVLPGBs issued during this TLB
+	 * invalidation sequence.
 	 */
 	cant_migrate();
 
@@ -106,6 +110,13 @@ static inline void invlpgb_flush_single_pcid_nosync(unsigned long pcid)
 /* Flush all mappings, including globals, for all PCIDs. */
 static inline void invlpgb_flush_all(void)
 {
+	/*
+	 * TLBSYNC at the end needs to make sure all flushes done on the
+	 * current CPU have been executed system-wide. Therefore, make
+	 * sure nothing gets migrated in-between but disable preemption
+	 * as it is cheaper.
+	 */
+	guard(preempt)();
 	__invlpgb(0, 0, 0, 1, 0, INVLPGB_INCLUDE_GLOBAL);
 	__tlbsync();
 }
@@ -119,10 +130,7 @@ static inline void invlpgb_flush_addr_nosync(unsigned long addr, u16 nr)
 /* Flush all mappings for all PCIDs except globals. */
 static inline void invlpgb_flush_all_nonglobals(void)
 {
-	/*
-	 * @addr=0 means both rax[1] (valid PCID) and rax[2] (valid ASID) are clear
-	 * so flush *any* PCID and ASID.
-	 */
+	guard(preempt)();
 	__invlpgb(0, 0, 0, 1, 0, 0);
 	__tlbsync();
 }
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index f49627e02311..8cd084bc3d98 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1075,19 +1075,11 @@ void flush_tlb_all(void)
 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
 
 	/* First try (faster) hardware-assisted TLB invalidation. */
-	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
-		/*
-		 * TLBSYNC at the end needs to make sure all flushes done
-		 * on the current CPU have been executed system-wide.
-		 * Therefore, make sure nothing gets migrated
-		 * in-between but disable preemption as it is cheaper.
-		 */
-		guard(preempt)();
+	if (cpu_feature_enabled(X86_FEATURE_INVLPGB))
 		invlpgb_flush_all();
-	} else {
+	else
 		/* Fall back to the IPI-based invalidation. */
 		on_each_cpu(do_flush_tlb_all, NULL, 1);
-	}
 }
 
 /* Flush an arbitrarily large range of memory with INVLPGB. */

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette