From: Barry Song <v-songbaohua@xxxxxxxx> Flushing TLB is usually quite expensive through hardware or software. Both x86 and arm64 have tried to decrease the overhead by either removing TLB flush and deferring it in ptep_clear_flush_young(). Removing the tlb flush gives about 20% ~ 30% swapout speedup on x86 according to commit b13b1d2d8692b ("x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB"). Similar result was also reported on arm64 by commit 3403e56b41c1(" arm64: mm: Don't wait for completion of TLB invalidation when page aging"). While platforms like x86 and arm64 have noticed the problem and resolved it by modifying ptep_clear_flush_young() to drop flush by some means, most platforms are still doing TLB flush. In LRU, it seems pointless to do TLB broadcast simply because of update access bit. Dropping flush in general LRU code seems be a proper way than removing TLB flush in ptep_clear_flush_young() in all kind of platforms as the name of the function is implying flush should be included. Removing flush in a function who is named by flush sounds vague. So this patch moves to ptep_clear_young_notify() clearly without flush in LRU code. This will help decrease the cost of TLB broadcast due to access bit in LRU. The side effect is some minor lose in the accuracy of PTE young data, but this has been proven to be not harmful by those mainstream platforms like x86 and arm64. Cc: Yu Zhao <yuzhao@xxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Alex Van Brunt <avanbrunt@xxxxxxxxxx> Cc: Shaohua Li <shli@xxxxxxxxxx> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> --- This RFC is inspired by the discussion in Yu Zhao's MGLRU: https://lore.kernel.org/lkml/CAOUHufYvH2LaGyAJZFQNOsGDBKD2++aFnTV6=qaVtcNrKjS_bA@xxxxxxxxxxxxxx/ mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 5bcb334cd6f2..7ce6f0b6c330 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -830,7 +830,7 @@ static bool folio_referenced_one(struct folio *folio, } if (pvmw.pte) { - if (ptep_clear_flush_young_notify(vma, address, + if (ptep_clear_young_notify(vma, address, pvmw.pte)) { /* * Don't treat a reference through -- 2.25.1