Patch "mm: avoid unnecessary flush on change_huge_pmd()" has been added to the 5.15-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sat, 15 Jun 2024 22:16:20 -0400

This is a note to let you know that I've just added the patch titled

    mm: avoid unnecessary flush on change_huge_pmd()

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-avoid-unnecessary-flush-on-change_huge_pmd.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 6f73cf81e6438c334ae03321c915e9d376501fd8
Author: Nadav Amit <nadav.amit@xxxxxxxxx>
Date:   Mon May 9 18:20:50 2022 -0700

    mm: avoid unnecessary flush on change_huge_pmd()
    
    [ Upstream commit 4f83145721f362c2f4d312edc4755269a2069488 ]
    
    Calls to change_protection_range() on THP can trigger, at least on x86,
    two TLB flushes for one page: one immediately, when pmdp_invalidate() is
    called by change_huge_pmd(), and then another one later (that can be
    batched) when change_protection_range() finishes.
    
    The first TLB flush is only necessary to prevent the dirty bit (and with a
    lesser importance the access bit) from changing while the PTE is modified.
    However, this is not necessary as the x86 CPUs set the dirty-bit
    atomically with an additional check that the PTE is (still) present.  One
    caveat is Intel's Knights Landing that has a bug and does not do so.
    
    Leverage this behavior to eliminate the unnecessary TLB flush in
    change_huge_pmd().  Introduce a new arch specific pmdp_invalidate_ad()
    that only invalidates the access and dirty bit from further changes.
    
    Link: https://lkml.kernel.org/r/20220401180821.1986781-4-namit@xxxxxxxxxx
    Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
    Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
    Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
    Cc: Andy Lutomirski <luto@xxxxxxxxxx>
    Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
    Cc: Peter Xu <peterx@xxxxxxxxxx>
    Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
    Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
    Cc: Will Deacon <will@xxxxxxxxxx>
    Cc: Yu Zhao <yuzhao@xxxxxxxxxx>
    Cc: Nick Piggin <npiggin@xxxxxxxxx>
    Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
    Stable-dep-of: 3a5a8d343e1c ("mm: fix race between __split_huge_pmd_locked() and GUP-fast")
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 448cd01eb3ecb..c04be133a6cd7 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1146,6 +1146,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 	}
 }
 #endif
+
+#define __HAVE_ARCH_PMDP_INVALIDATE_AD
+extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma,
+				unsigned long address, pmd_t *pmdp);
+
 /*
  * Page table pages are page-aligned.  The lower half of the top
  * level is used for userspace and the top half for the kernel.
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3481b35cb4ec7..f16059e9a85e7 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -608,6 +608,16 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma,
 
 	return young;
 }
+
+pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
+			 pmd_t *pmdp)
+{
+	/*
+	 * No flush is necessary. Once an invalid PTE is established, the PTE's
+	 * access and dirty bits cannot be updated.
+	 */
+	return pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
+}
 #endif
 
 /**
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d468efcf48f45..952969aa19ec1 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -562,6 +562,26 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 #endif
 
+#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD
+
+/*
+ * pmdp_invalidate_ad() invalidates the PMD while changing a transparent
+ * hugepage mapping in the page tables. This function is similar to
+ * pmdp_invalidate(), but should only be used if the access and dirty bits would
+ * not be cleared by the software in the new PMD value. The function ensures
+ * that hardware changes of the access and dirty bits updates would not be lost.
+ *
+ * Doing so can allow in certain architectures to avoid a TLB flush in most
+ * cases. Yet, another TLB flush might be necessary later if the PMD update
+ * itself requires such flush (e.g., if protection was set to be stricter). Yet,
+ * even when a TLB flush is needed because of the update, the caller may be able
+ * to batch these TLB flushing operations, so fewer TLB flush operations are
+ * needed.
+ */
+extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma,
+				unsigned long address, pmd_t *pmdp);
+#endif
+
 #ifndef __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8ab6316d85391..265ef8d1393c5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1798,10 +1798,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	 * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
 	 * which may break userspace.
 	 *
-	 * pmdp_invalidate() is required to make sure we don't miss
+	 * pmdp_invalidate_ad() is required to make sure we don't miss
 	 * dirty/young flags set by hardware.
 	 */
-	oldpmd = pmdp_invalidate(vma, addr, pmd);
+	oldpmd = pmdp_invalidate_ad(vma, addr, pmd);
 
 	entry = pmd_modify(oldpmd, newprot);
 	if (preserve_write)
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 4e640baf97948..b0ce6c7391bf4 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -200,6 +200,14 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 }
 #endif
 
+#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD
+pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
+			 pmd_t *pmdp)
+{
+	return pmdp_invalidate(vma, address, pmdp);
+}
+#endif
+
 #ifndef pmdp_collapse_flush
 pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
 			  pmd_t *pmdp)