The patch titled Subject: mm: count the number of anonymous THPs per size has been added to the -mm mm-unstable branch. Its filename is mm-count-the-number-of-anonymous-thps-per-size.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-count-the-number-of-anonymous-thps-per-size.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Barry Song <v-songbaohua@xxxxxxxx> Subject: mm: count the number of anonymous THPs per size Date: Sat, 24 Aug 2024 13:04:40 +1200 Patch series "mm: count the number of anonymous THPs per size", v4. Knowing the number of transparent anon THPs in the system is crucial for performance analysis. It helps in understanding the ratio and distribution of THPs versus small folios throughout the system. Additionally, partial unmapping by userspace can lead to significant waste of THPs over time and increase memory reclamation pressure. We need this information for comprehensive system tuning. This patch (of 2): Let's track for each anonymous THP size, how many of them are currently allocated. We'll track the complete lifespan of an anon THP, starting when it becomes an anon THP ("large anon folio") (->mapping gets set), until it gets freed (->mapping gets cleared). Introduce a new "nr_anon" counter per THP size and adjust the corresponding counter in the following cases: * We allocate a new THP and call folio_add_new_anon_rmap() to map it the first time and turn it into an anon THP. * We split an anon THP into multiple smaller ones. * We migrate an anon THP, when we prepare the destination. * We free an anon THP back to the buddy. Note that AnonPages in /proc/meminfo currently tracks the total number of *mapped* anonymous *pages*, and therefore has slightly different semantics. In the future, we might also want to track "nr_anon_mapped" for each THP size, which might be helpful when comparing it to the number of allocated anon THPs (long-term pinning, stuck in swapcache, memory leaks, ...). Further note that for now, we only track anon THPs after they got their ->mapping set, for example via folio_add_new_anon_rmap(). If we would allocate some in the swapcache, they will only show up in the statistics for now after they have been mapped to user space the first time, where we call folio_add_new_anon_rmap(). Link: https://lkml.kernel.org/r/20240824010441.21308-1-21cnbao@xxxxxxxxx Link: https://lkml.kernel.org/r/20240824010441.21308-2-21cnbao@xxxxxxxxx Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> Acked-by: David Hildenbrand <david@xxxxxxxxxx> Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> Cc: Chris Li <chrisl@xxxxxxxxxx> Cc: Chuanhua Han <hanchuanhua@xxxxxxxx> Cc: Kairui Song <kasong@xxxxxxxxxxx> Cc: Kalesh Singh <kaleshsingh@xxxxxxxxxx> Cc: Lance Yang <ioworker0@xxxxxxxxx> Cc: Ryan Roberts <ryan.roberts@xxxxxxx> Cc: Shuai Yuan <yuanshuai@xxxxxxxx> Cc: Usama Arif <usamaarif642@xxxxxxxxx> Cc: Zi Yan <ziy@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- Documentation/admin-guide/mm/transhuge.rst | 5 +++++ include/linux/huge_mm.h | 15 +++++++++++++-- mm/huge_memory.c | 13 ++++++++++--- mm/migrate.c | 4 ++++ mm/page_alloc.c | 5 ++++- mm/rmap.c | 1 + 6 files changed, 37 insertions(+), 6 deletions(-) --- a/Documentation/admin-guide/mm/transhuge.rst~mm-count-the-number-of-anonymous-thps-per-size +++ a/Documentation/admin-guide/mm/transhuge.rst @@ -551,6 +551,11 @@ split_deferred it would free up some memory. Pages on split queue are going to be split under memory pressure, if splitting is possible. +nr_anon + the number of transparent anon huge pages we have in the whole system. + These huge pages could be entirely mapped or have partially + unmapped/unused subpages. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help --- a/include/linux/huge_mm.h~mm-count-the-number-of-anonymous-thps-per-size +++ a/include/linux/huge_mm.h @@ -126,6 +126,7 @@ enum mthp_stat_item { MTHP_STAT_SPLIT, MTHP_STAT_SPLIT_FAILED, MTHP_STAT_SPLIT_DEFERRED, + MTHP_STAT_NR_ANON, __MTHP_STAT_COUNT }; @@ -136,14 +137,24 @@ struct mthp_stat { DECLARE_PER_CPU(struct mthp_stat, mthp_stats); -static inline void count_mthp_stat(int order, enum mthp_stat_item item) +static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int delta) { if (order <= 0 || order > PMD_ORDER) return; - this_cpu_inc(mthp_stats.stats[order][item]); + this_cpu_add(mthp_stats.stats[order][item], delta); +} + +static inline void count_mthp_stat(int order, enum mthp_stat_item item) +{ + mod_mthp_stat(order, item, 1); } + #else +static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int delta) +{ +} + static inline void count_mthp_stat(int order, enum mthp_stat_item item) { } --- a/mm/huge_memory.c~mm-count-the-number-of-anonymous-thps-per-size +++ a/mm/huge_memory.c @@ -597,6 +597,7 @@ DEFINE_MTHP_STAT_ATTR(shmem_fallback_cha DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT); DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); +DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); static struct attribute *anon_stats_attrs[] = { &anon_fault_alloc_attr.attr, @@ -609,6 +610,7 @@ static struct attribute *anon_stats_attr &split_attr.attr, &split_failed_attr.attr, &split_deferred_attr.attr, + &nr_anon_attr.attr, NULL, }; @@ -3314,8 +3316,9 @@ int split_huge_page_to_list_to_order(str struct deferred_split *ds_queue = get_deferred_split_queue(folio); /* reset xarray order to new order after split */ XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order); - struct anon_vma *anon_vma = NULL; + bool is_anon = folio_test_anon(folio); struct address_space *mapping = NULL; + struct anon_vma *anon_vma = NULL; int order = folio_order(folio); int extra_pins, ret; pgoff_t end; @@ -3327,7 +3330,7 @@ int split_huge_page_to_list_to_order(str if (new_order >= folio_order(folio)) return -EINVAL; - if (folio_test_anon(folio)) { + if (is_anon) { /* order-1 is not supported for anonymous THP. */ if (new_order == 1) { VM_WARN_ONCE(1, "Cannot split to order-1 folio"); @@ -3367,7 +3370,7 @@ int split_huge_page_to_list_to_order(str if (folio_test_writeback(folio)) return -EBUSY; - if (folio_test_anon(folio)) { + if (is_anon) { /* * The caller does not necessarily hold an mmap_lock that would * prevent the anon_vma disappearing so we first we take a @@ -3480,6 +3483,10 @@ int split_huge_page_to_list_to_order(str } } + if (is_anon) { + mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1); + mod_mthp_stat(new_order, MTHP_STAT_NR_ANON, 1 << (order - new_order)); + } __split_huge_page(page, list, end, new_order); ret = 0; } else { --- a/mm/migrate.c~mm-count-the-number-of-anonymous-thps-per-size +++ a/mm/migrate.c @@ -450,6 +450,8 @@ static int __folio_migrate_mapping(struc /* No turning back from here */ newfolio->index = folio->index; newfolio->mapping = folio->mapping; + if (folio_test_anon(folio) && folio_test_large(folio)) + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); if (folio_test_swapbacked(folio)) __folio_set_swapbacked(newfolio); @@ -474,6 +476,8 @@ static int __folio_migrate_mapping(struc */ newfolio->index = folio->index; newfolio->mapping = folio->mapping; + if (folio_test_anon(folio) && folio_test_large(folio)) + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); folio_ref_add(newfolio, nr); /* add cache reference */ if (folio_test_swapbacked(folio)) { __folio_set_swapbacked(newfolio); --- a/mm/page_alloc.c~mm-count-the-number-of-anonymous-thps-per-size +++ a/mm/page_alloc.c @@ -1084,8 +1084,11 @@ __always_inline bool free_pages_prepare( (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; } } - if (PageMappingFlags(page)) + if (PageMappingFlags(page)) { + if (PageAnon(page)) + mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1); page->mapping = NULL; + } if (is_check_pages_enabled()) { if (free_page_is_bad(page)) bad++; --- a/mm/rmap.c~mm-count-the-number-of-anonymous-thps-per-size +++ a/mm/rmap.c @@ -1467,6 +1467,7 @@ void folio_add_new_anon_rmap(struct foli } __folio_mod_stat(folio, nr, nr_pmdmapped); + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); } static __always_inline void __folio_add_file_rmap(struct folio *folio, _ Patches currently in -mm which might be from v-songbaohua@xxxxxxxx are mm-extend-usage-parameter-so-that-cluster_swap_free_nr-can-be-reused.patch mm-swap-add-nr-argument-in-swapcache_prepare-and-swapcache_clear-to-support-large-folios.patch mm-swap-add-nr-argument-in-swapcache_prepare-and-swapcache_clear-to-support-large-folios-fix.patch vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail.patch mm-document-__gfp_nofail-must-be-blockable.patch mm-bug_on-to-avoid-null-deference-while-__gfp_nofail-fails.patch mm-prohibit-null-deference-exposed-for-unsupported-non-blockable-__gfp_nofail.patch mm-rename-instances-of-swap_info_struct-to-meaningful-si.patch mm-attempt-to-batch-free-swap-entries-for-zap_pte_range.patch mm-attempt-to-batch-free-swap-entries-for-zap_pte_range-fix.patch mm-override-mthp-enabled-defaults-at-kernel-cmdline-fix.patch mm-count-the-number-of-anonymous-thps-per-size.patch mm-count-the-number-of-partially-mapped-anonymous-thps-per-size.patch