The patch titled Subject: mm: avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap has been added to the -mm mm-unstable branch. Its filename is mm-avoid-splitting-pmd-for-lazyfree-pmd-mapped-thp-in-try_to_unmap.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-avoid-splitting-pmd-for-lazyfree-pmd-mapped-thp-in-try_to_unmap.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Barry Song <v-songbaohua@xxxxxxxx> Subject: mm: avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap Date: Wed, 15 Jan 2025 16:38:08 +1300 The try_to_unmap_one() function currently handles PMD-mapped THPs inefficiently. It first splits the PMD into PTEs, copies the dirty state from the PMD to the PTEs, iterates over the PTEs to locate the dirty state, and then marks the THP as swap-backed. This process involves unnecessary PMD splitting and redundant iteration. Instead, this functionality can be efficiently managed in __discard_anon_folio_pmd_locked(), avoiding the extra steps and improving performance. The following microbenchmark redirties folios after invoking MADV_FREE, then measures the time taken to perform memory reclamation (actually set those folios swapbacked again) on the redirtied folios. #include <stdio.h> #include <sys/mman.h> #include <string.h> #include <time.h> #define SIZE 128*1024*1024 // 128 MB int main(int argc, char *argv[]) { while(1) { volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset((void *)p, 1, SIZE); madvise((void *)p, SIZE, MADV_FREE); /* redirty after MADV_FREE */ memset((void *)p, 1, SIZE); clock_t start_time = clock(); madvise((void *)p, SIZE, MADV_PAGEOUT); clock_t end_time = clock(); double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC; printf("Time taken by reclamation: %f seconds\n", elapsed_time); munmap((void *)p, SIZE); } return 0; } Testing results are as below, w/o patch: ~ # ./a.out Time taken by reclamation: 0.007300 seconds Time taken by reclamation: 0.007226 seconds Time taken by reclamation: 0.007295 seconds Time taken by reclamation: 0.007731 seconds Time taken by reclamation: 0.007134 seconds Time taken by reclamation: 0.007285 seconds Time taken by reclamation: 0.007720 seconds Time taken by reclamation: 0.007128 seconds Time taken by reclamation: 0.007710 seconds Time taken by reclamation: 0.007712 seconds Time taken by reclamation: 0.007236 seconds Time taken by reclamation: 0.007690 seconds Time taken by reclamation: 0.007174 seconds Time taken by reclamation: 0.007670 seconds Time taken by reclamation: 0.007169 seconds Time taken by reclamation: 0.007305 seconds Time taken by reclamation: 0.007432 seconds Time taken by reclamation: 0.007158 seconds Time taken by reclamation: 0.007133 seconds â?¦ w/ patch ~ # ./a.out Time taken by reclamation: 0.002124 seconds Time taken by reclamation: 0.002116 seconds Time taken by reclamation: 0.002150 seconds Time taken by reclamation: 0.002261 seconds Time taken by reclamation: 0.002137 seconds Time taken by reclamation: 0.002173 seconds Time taken by reclamation: 0.002063 seconds Time taken by reclamation: 0.002088 seconds Time taken by reclamation: 0.002169 seconds Time taken by reclamation: 0.002124 seconds Time taken by reclamation: 0.002111 seconds Time taken by reclamation: 0.002224 seconds Time taken by reclamation: 0.002297 seconds Time taken by reclamation: 0.002260 seconds Time taken by reclamation: 0.002246 seconds Time taken by reclamation: 0.002272 seconds Time taken by reclamation: 0.002277 seconds Time taken by reclamation: 0.002462 seconds â?¦ This patch significantly speeds up try_to_unmap_one() by allowing it to skip redirtied THPs without splitting the PMD. Link: https://lkml.kernel.org/r/20250115033808.40641-5-21cnbao@xxxxxxxxx Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> Suggested-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> Suggested-by: Lance Yang <ioworker0@xxxxxxxxx> Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> Reviewed-by: Lance Yang <ioworker0@xxxxxxxxx> Cc: Albert Ou <aou@xxxxxxxxxxxxxxxxx> Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx> Cc: Borislav Petkov <bp@xxxxxxxxx> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Chis Li <chrisl@xxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Gavin Shan <gshan@xxxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: "Huang, Ying" <ying.huang@xxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Kairui Song <kasong@xxxxxxxxxxx> Cc: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> Cc: Mark Rutland <mark.rutland@xxxxxxx> Cc: Mauricio Faria de Oliveira <mfo@xxxxxxxxxxxxx> Cc: Palmer Dabbelt <palmer@xxxxxxxxxxx> Cc: Paul Walmsley <paul.walmsley@xxxxxxxxxx> Cc: Ryan Roberts <ryan.roberts@xxxxxxx> Cc: Shaoqin Huang <shahuang@xxxxxxxxxx> Cc: Tangquan Zheng <zhengtangquan@xxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Yicong Yang <yangyicong@xxxxxxxxxxxxx> Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/huge_memory.c | 24 +++++++++++++++++------- mm/rmap.c | 13 ++++++++++--- 2 files changed, 27 insertions(+), 10 deletions(-) --- a/mm/huge_memory.c~mm-avoid-splitting-pmd-for-lazyfree-pmd-mapped-thp-in-try_to_unmap +++ a/mm/huge_memory.c @@ -3070,8 +3070,12 @@ static bool __discard_anon_folio_pmd_loc int ref_count, map_count; pmd_t orig_pmd = *pmdp; - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd)) + if (pmd_dirty(orig_pmd)) + folio_set_dirty(folio); + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + folio_set_swapbacked(folio); return false; + } orig_pmd = pmdp_huge_clear_flush(vma, addr, pmdp); @@ -3098,8 +3102,15 @@ static bool __discard_anon_folio_pmd_loc * * The only folio refs must be one from isolation plus the rmap(s). */ - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) || - ref_count != map_count + 1) { + if (pmd_dirty(orig_pmd)) + folio_set_dirty(folio); + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + folio_set_swapbacked(folio); + set_pmd_at(mm, addr, pmdp, orig_pmd); + return false; + } + + if (ref_count != map_count + 1) { set_pmd_at(mm, addr, pmdp, orig_pmd); return false; } @@ -3119,12 +3130,11 @@ bool unmap_huge_pmd_locked(struct vm_are { VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio); VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio); VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE)); - if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) - return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio); - - return false; + return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio); } static void remap_page(struct folio *folio, unsigned long nr, int flags) --- a/mm/rmap.c~mm-avoid-splitting-pmd-for-lazyfree-pmd-mapped-thp-in-try_to_unmap +++ a/mm/rmap.c @@ -1724,9 +1724,16 @@ static bool try_to_unmap_one(struct foli } if (!pvmw.pte) { - if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, - folio)) - goto walk_done; + if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) { + if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio)) + goto walk_done; + /* + * unmap_huge_pmd_locked has either already marked + * the folio as swap-backed or decided to retain it + * due to GUP or speculative references. + */ + goto walk_abort; + } if (flags & TTU_SPLIT_HUGE_PMD) { /* _ Patches currently in -mm which might be from v-songbaohua@xxxxxxxx are mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one.patch mm-support-tlbbatch-flush-for-a-range-of-ptes.patch mm-support-batched-unmap-for-lazyfree-large-folios-during-reclamation.patch mm-avoid-splitting-pmd-for-lazyfree-pmd-mapped-thp-in-try_to_unmap.patch