This patchset extends khugepaged from collapsing only PMD-sized THPs to collapsing anonymous mTHPs. mTHPs were introduced in the kernel to improve memory management by allocating chunks of larger memory, so as to reduce number of page faults, TLB misses (due to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property is often lost due to CoW, swap-in/out, and when the kernel just cannot find enough physically contiguous memory to allocate on fault. Henceforth, there is a need to regain mTHPs in the system asynchronously. This work is an attempt in this direction, starting with anonymous folios. In the fault handler, we select the THP order in a greedy manner; the same has been used here, along with the same sysfs interface to control the order of collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock(). --------------------------------------------------------- Testing --------------------------------------------------------- The set has been build tested on x86_64. For Aarch64, 1. mm-selftests: No regressions. 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs), and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs. This patchset is rebased on mm-unstable (4637fa5d47a49c977116321cc575ea22215df22d). v1->v2: - Handle VMAs less than PMD size (patches 12-15) - Do not add mTHP into deferred split queue - Drop lock optimization and collapse mTHP under mmap_write_lock() - Define policy on what to do when we encounter a folio order larger than the order we are scanning for - Prevent the creep problem by enforcing tunable simplification - Update Documentation - Drop patch 12 from v1 updating selftest w.r.t the creep problem - Drop patch 1 from v1 v1: https://lore.kernel.org/all/20241216165105.56185-1-dev.jain@xxxxxxx/ Dev Jain (17): khugepaged: Generalize alloc_charge_folio() khugepaged: Generalize hugepage_vma_revalidate() khugepaged: Generalize __collapse_huge_page_swapin() khugepaged: Generalize __collapse_huge_page_isolate() khugepaged: Generalize __collapse_huge_page_copy() khugepaged: Abstract PMD-THP collapse khugepaged: Scan PTEs order-wise khugepaged: Introduce vma_collapse_anon_folio() khugepaged: Define collapse policy if a larger folio is already mapped khugepaged: Exit early on fully-mapped aligned mTHP khugepaged: Enable sysfs to control order of collapse khugepaged: Enable variable-sized VMA collapse khugepaged: Lock all VMAs mapping the PTE table khugepaged: Reset scan address to correct alignment khugepaged: Delay cond_resched() khugepaged: Implement strict policy for mTHP collapse Documentation: transhuge: Define khugepaged mTHP collapse policy Documentation/admin-guide/mm/transhuge.rst | 49 +- include/linux/huge_mm.h | 2 + mm/huge_memory.c | 4 + mm/khugepaged.c | 603 ++++++++++++++++----- 4 files changed, 511 insertions(+), 147 deletions(-) -- 2.30.2