+ mm-thp-account-pte-mapped-anonymous-thp-usage.patch added to mm-unstable branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm: thp: account pte-mapped anonymous THP usage
has been added to the -mm mm-unstable branch.  Its filename is
     mm-thp-account-pte-mapped-anonymous-thp-usage.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-thp-account-pte-mapped-anonymous-thp-usage.patch

This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Ryan Roberts <ryan.roberts@xxxxxxx>
Subject: mm: thp: account pte-mapped anonymous THP usage
Date: Fri, 29 Sep 2023 12:44:14 +0100

Add accounting for pte-mapped anonymous transparent hugepages at various
locations.  This visibility will aid in debugging and tuning performance
for the "small order" thp extension that will be added in a subsequent
commit, where hugepages can be allocated which are large (greater than
order-0) but smaller than PMD_ORDER.  This new accounting follows a
similar pattern to the existing NR_ANON_THPS, which measures pmd-mapped
anonymous transparent hugepages.

We account pte-mapped anonymous thp mappings per-page, where the page is
mapped at least once via PTE and the page belongs to a large folio.  So
when a page belonging to a large folio is PTE-mapped for the first time,
then we add 1 to NR_ANON_THPS_PTEMAPPED.  And when a page belonging to a
large folio is PTE-unmapped for the last time, then we remove 1 from
NR_ANON_THPS_PTEMAPPED.

/proc/meminfo:
  Introduce new "AnonHugePteMap" field, which reports the amount of
  memory (in KiB) mapped from large folios globally (similar to
  AnonHugePages field).

/proc/vmstat:
  Introduce new "nr_anon_thp_pte" field, which reports the amount of
  memory (in pages) mapped from large folios globally (similar to
  nr_anon_transparent_hugepages field).

/sys/devices/system/node/nodeX/meminfo
  Introduce new "AnonHugePteMap" field, which reports the amount of
  memory (in KiB) mapped from large folios per-node (similar to
  AnonHugePages field).

show_mem (panic logger):
  Introduce new "anon_thp_pte" field, which reports the amount of memory
  (in KiB) mapped from large folios per-node (similar to anon_thp
  field).

memory.stat (cgroup v1 and v2):
  Introduce new "anon_thp_pte" field, which reports the amount of memory
  (in bytes) mapped from large folios in the memcg (similar to rss_huge
  (v1) / anon_thp (v2) fields).

/proc/<pid>/smaps & /proc/<pid>/smaps_rollup:
  Introduce new "AnonHugePteMap" field, which reports the amount of
  memory (in KiB) mapped from large folios within the vma/process
  (similar to AnonHugePages field).

NOTE on charge migration: The new NR_ANON_THPS_PTEMAPPED charge is NOT
moved between cgroups, even when the (v1) memory.move_charge_at_immigrate
feature is enabled.  That feature is marked deprecated and the current
code does not attempt to move the NR_ANON_MAPPED charge for large
PTE-mapped folios anyway (see comment in
mem_cgroup_move_charge_pte_range()).  If this code was enhanced to allow
moving the NR_ANON_MAPPED charge for large PTE-mapped folios, we would
also need to add support for moving the new NR_ANON_THPS_PTEMAPPED charge.
This would likely get quite fiddly.  Given the deprecation of
memory.move_charge_at_immigrate, I assume it is not valuable to implement.

NOTE on naming: Given the new small order anonymous thp feature will be
exposed to user space as an extension to thp, I've opted to call the new
counters after thp also (as aposed to "large"/"large folio"/etc.), so
"huge" no longer strictly means PMD - one could argue hugetlb already
breaks this rule anyway.  I also did not want to risk breaking back compat
by renaming/redefining the existing counters (which would have resulted in
more consistent and clearer names).  So the existing NR_ANON_THPS counters
remain and continue to only refer to PMD-mapped THPs.  And I've added new
counters, which only refer to PTE-mapped THPs.

Link: https://lkml.kernel.org/r/20230929114421.3761121-4-ryan.roberts@xxxxxxx
Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx>
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Huang Ying <ying.huang@xxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Itaru Kitayama <itaru.kitayama@xxxxxxxxx>
Cc: John Hubbard <jhubbard@xxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Luis Chamberlain <mcgrof@xxxxxxxxxx>
Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Yang Shi <shy828301@xxxxxxxxx>
Cc: Yin Fengwei <fengwei.yin@xxxxxxxxx>
Cc: Yu Zhao <yuzhao@xxxxxxxxxx>
Cc: Zi Yan <ziy@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 Documentation/ABI/testing/procfs-smaps_rollup  |    1 +
 Documentation/admin-guide/cgroup-v1/memory.rst |    5 ++++-
 Documentation/admin-guide/cgroup-v2.rst        |    6 +++++-
 Documentation/admin-guide/mm/transhuge.rst     |   11 +++++++----
 Documentation/filesystems/proc.rst             |   14 ++++++++++++--
 drivers/base/node.c                            |    2 ++
 fs/proc/meminfo.c                              |    2 ++
 fs/proc/task_mmu.c                             |    4 ++++
 include/linux/mmzone.h                         |    1 +
 mm/memcontrol.c                                |    8 ++++++++
 mm/rmap.c                                      |   11 +++++++++--
 mm/show_mem.c                                  |    2 ++
 mm/vmstat.c                                    |    1 +
 13 files changed, 58 insertions(+), 10 deletions(-)

--- a/Documentation/ABI/testing/procfs-smaps_rollup~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/Documentation/ABI/testing/procfs-smaps_rollup
@@ -34,6 +34,7 @@ Description:
 			Anonymous:	      68 kB
 			LazyFree:	       0 kB
 			AnonHugePages:	       0 kB
+			AnonHugePteMap:        0 kB
 			ShmemPmdMapped:	       0 kB
 			Shared_Hugetlb:	       0 kB
 			Private_Hugetlb:       0 kB
--- a/Documentation/admin-guide/cgroup-v1/memory.rst~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -542,7 +542,10 @@ memory.stat file includes following stat
     cache           # of bytes of page cache memory.
     rss             # of bytes of anonymous and swap cache memory (includes
                     transparent hugepages).
-    rss_huge        # of bytes of anonymous transparent hugepages.
+    rss_huge        # of bytes of anonymous transparent hugepages, mapped by
+                    PMD.
+    anon_thp_pte    # of bytes of anonymous transparent hugepages, mapped by
+                    PTE.
     mapped_file     # of bytes of mapped file (includes tmpfs/shmem)
     pgpgin          # of charging events to the memory cgroup. The charging
                     event happens each time a page is accounted as either mapped
--- a/Documentation/admin-guide/cgroup-v2.rst~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/Documentation/admin-guide/cgroup-v2.rst
@@ -1421,7 +1421,11 @@ PAGE_SIZE multiple when read back.
 
 	  anon_thp
 		Amount of memory used in anonymous mappings backed by
-		transparent hugepages
+		transparent hugepages, mapped by PMD
+
+	  anon_thp_pte
+		Amount of memory used in anonymous mappings backed by
+		transparent hugepages, mapped by PTE
 
 	  file_thp
 		Amount of cached filesystem data backed by transparent
--- a/Documentation/admin-guide/mm/transhuge.rst~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/Documentation/admin-guide/mm/transhuge.rst
@@ -291,10 +291,13 @@ Monitoring usage
 ================
 
 The number of anonymous transparent huge pages currently used by the
-system is available by reading the AnonHugePages field in ``/proc/meminfo``.
-To identify what applications are using anonymous transparent huge pages,
-it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages fields
-for each mapping.
+system is available by reading the AnonHugePages and AnonHugePteMap
+fields in ``/proc/meminfo``. To identify what applications are using
+anonymous transparent huge pages, it is necessary to read
+``/proc/PID/smaps`` and count the AnonHugePages and AnonHugePteMap
+fields for each mapping. Note that in both cases, AnonHugePages refers
+only to PMD-mapped THPs. AnonHugePteMap refers to THPs that are mapped
+using PTEs.
 
 The number of file transparent huge pages mapped to userspace is available
 by reading ShmemPmdMapped and ShmemHugePages fields in ``/proc/meminfo``.
--- a/Documentation/filesystems/proc.rst~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/Documentation/filesystems/proc.rst
@@ -464,6 +464,7 @@ Memory Area, or VMA) there is a series o
     KSM:                   0 kB
     LazyFree:              0 kB
     AnonHugePages:         0 kB
+    AnonHugePteMap:        0 kB
     ShmemPmdMapped:        0 kB
     Shared_Hugetlb:        0 kB
     Private_Hugetlb:       0 kB
@@ -511,7 +512,11 @@ pressure if the memory is clean. Please
 be lower than the real value due to optimizations used in the current
 implementation. If this is not desirable please file a bug report.
 
-"AnonHugePages" shows the amount of memory backed by transparent hugepage.
+"AnonHugePages" shows the amount of memory backed by transparent hugepage,
+mapped by PMD.
+
+"AnonHugePteMap" shows the amount of memory backed by transparent hugepage,
+mapped by PTE.
 
 "ShmemPmdMapped" shows the amount of shared (shmem/tmpfs) memory backed by
 huge pages.
@@ -1006,6 +1011,7 @@ Example output. You may not have all of
     EarlyMemtestBad:       0 kB
     HardwareCorrupted:     0 kB
     AnonHugePages:   4149248 kB
+    AnonHugePteMap:        0 kB
     ShmemHugePages:        0 kB
     ShmemPmdMapped:        0 kB
     FileHugePages:         0 kB
@@ -1165,7 +1171,11 @@ HardwareCorrupted
               The amount of RAM/memory in KB, the kernel identifies as
               corrupted.
 AnonHugePages
-              Non-file backed huge pages mapped into userspace page tables
+              Non-file backed huge pages mapped into userspace page tables by
+              PMD
+AnonHugePteMap
+              Non-file backed huge pages mapped into userspace page tables by
+              PTE
 ShmemHugePages
               Memory used by shared memory (shmem) and tmpfs allocated
               with huge pages
--- a/drivers/base/node.c~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/drivers/base/node.c
@@ -443,6 +443,7 @@ static ssize_t node_read_meminfo(struct
 			     "Node %d SUnreclaim:     %8lu kB\n"
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			     "Node %d AnonHugePages:  %8lu kB\n"
+			     "Node %d AnonHugePteMap: %8lu kB\n"
 			     "Node %d ShmemHugePages: %8lu kB\n"
 			     "Node %d ShmemPmdMapped: %8lu kB\n"
 			     "Node %d FileHugePages:  %8lu kB\n"
@@ -475,6 +476,7 @@ static ssize_t node_read_meminfo(struct
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			     ,
 			     nid, K(node_page_state(pgdat, NR_ANON_THPS)),
+			     nid, K(node_page_state(pgdat, NR_ANON_THPS_PTEMAPPED)),
 			     nid, K(node_page_state(pgdat, NR_SHMEM_THPS)),
 			     nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)),
 			     nid, K(node_page_state(pgdat, NR_FILE_THPS)),
--- a/fs/proc/meminfo.c~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/fs/proc/meminfo.c
@@ -143,6 +143,8 @@ static int meminfo_proc_show(struct seq_
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	show_val_kb(m, "AnonHugePages:  ",
 		    global_node_page_state(NR_ANON_THPS));
+	show_val_kb(m, "AnonHugePteMap: ",
+		    global_node_page_state(NR_ANON_THPS_PTEMAPPED));
 	show_val_kb(m, "ShmemHugePages: ",
 		    global_node_page_state(NR_SHMEM_THPS));
 	show_val_kb(m, "ShmemPmdMapped: ",
--- a/fs/proc/task_mmu.c~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/fs/proc/task_mmu.c
@@ -394,6 +394,7 @@ struct mem_size_stats {
 	unsigned long anonymous;
 	unsigned long lazyfree;
 	unsigned long anonymous_thp;
+	unsigned long anonymous_thp_pte;
 	unsigned long shmem_thp;
 	unsigned long file_thp;
 	unsigned long swap;
@@ -454,6 +455,8 @@ static void smaps_account(struct mem_siz
 		mss->anonymous += size;
 		if (!PageSwapBacked(page) && !dirty && !PageDirty(page))
 			mss->lazyfree += size;
+		if (!compound && PageTransCompound(page))
+			mss->anonymous_thp_pte += size;
 	}
 
 	if (PageKsm(page))
@@ -835,6 +838,7 @@ static void __show_smap(struct seq_file
 	SEQ_PUT_DEC(" kB\nKSM:            ", mss->ksm);
 	SEQ_PUT_DEC(" kB\nLazyFree:       ", mss->lazyfree);
 	SEQ_PUT_DEC(" kB\nAnonHugePages:  ", mss->anonymous_thp);
+	SEQ_PUT_DEC(" kB\nAnonHugePteMap: ", mss->anonymous_thp_pte);
 	SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp);
 	SEQ_PUT_DEC(" kB\nFilePmdMapped:  ", mss->file_thp);
 	SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb);
--- a/include/linux/mmzone.h~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/include/linux/mmzone.h
@@ -186,6 +186,7 @@ enum node_stat_item {
 	NR_FILE_THPS,
 	NR_FILE_PMDMAPPED,
 	NR_ANON_THPS,
+	NR_ANON_THPS_PTEMAPPED,
 	NR_VMSCAN_WRITE,
 	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_DIRTIED,		/* page dirtyings since bootup */
--- a/mm/memcontrol.c~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/mm/memcontrol.c
@@ -827,6 +827,7 @@ void __mod_memcg_lruvec_state(struct lru
 		case NR_ANON_MAPPED:
 		case NR_FILE_MAPPED:
 		case NR_ANON_THPS:
+		case NR_ANON_THPS_PTEMAPPED:
 		case NR_SHMEM_PMDMAPPED:
 		case NR_FILE_PMDMAPPED:
 			WARN_ON_ONCE(!in_task());
@@ -1517,6 +1518,7 @@ static const struct memory_stat memory_s
 #endif
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	{ "anon_thp",			NR_ANON_THPS			},
+	{ "anon_thp_pte",		NR_ANON_THPS_PTEMAPPED		},
 	{ "file_thp",			NR_FILE_THPS			},
 	{ "shmem_thp",			NR_SHMEM_THPS			},
 #endif
@@ -4185,6 +4187,7 @@ static const unsigned int memcg1_stats[]
 	NR_ANON_MAPPED,
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	NR_ANON_THPS,
+	NR_ANON_THPS_PTEMAPPED,
 #endif
 	NR_SHMEM,
 	NR_FILE_MAPPED,
@@ -4203,6 +4206,7 @@ static const char *const memcg1_stat_nam
 	"rss",
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	"rss_huge",
+	"anon_thp_pte",
 #endif
 	"shmem",
 	"mapped_file",
@@ -6403,6 +6407,10 @@ retry:
 			 * can be done but it would be too convoluted so simply
 			 * ignore such a partial THP and keep it in original
 			 * memcg. There should be somebody mapping the head.
+			 * This simplification also means that pte-mapped large
+			 * folios are never migrated, which means we don't need
+			 * to worry about migrating the NR_ANON_THPS_PTEMAPPED
+			 * accounting.
 			 */
 			if (PageTransCompound(page))
 				goto put;
--- a/mm/rmap.c~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/mm/rmap.c
@@ -1244,7 +1244,7 @@ void page_add_anon_rmap(struct page *pag
 {
 	struct folio *folio = page_folio(page);
 	atomic_t *mapped = &folio->_nr_pages_mapped;
-	int nr = 0, nr_pmdmapped = 0;
+	int nr = 0, nr_pmdmapped = 0, nr_lgmapped = 0;
 	bool compound = flags & RMAP_COMPOUND;
 	bool first;
 
@@ -1253,6 +1253,7 @@ void page_add_anon_rmap(struct page *pag
 		first = atomic_inc_and_test(&page->_mapcount);
 		nr = first;
 		if (first && folio_test_large(folio)) {
+			nr_lgmapped = 1;
 			nr = atomic_inc_return_relaxed(mapped);
 			nr = (nr < COMPOUND_MAPPED);
 		}
@@ -1277,6 +1278,8 @@ void page_add_anon_rmap(struct page *pag
 
 	if (nr_pmdmapped)
 		__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped);
+	if (nr_lgmapped)
+		__lruvec_stat_mod_folio(folio, NR_ANON_THPS_PTEMAPPED, nr_lgmapped);
 	if (nr)
 		__lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
 
@@ -1350,6 +1353,7 @@ void folio_add_new_anon_rmap(struct foli
 		}
 
 		atomic_set(&folio->_nr_pages_mapped, nr);
+		__lruvec_stat_mod_folio(folio, NR_ANON_THPS_PTEMAPPED, nr);
 	} else {
 		/* increment count (starts at -1) */
 		atomic_set(&folio->_entire_mapcount, 0);
@@ -1464,7 +1468,7 @@ void page_remove_rmap(struct page *page,
 {
 	struct folio *folio = page_folio(page);
 	atomic_t *mapped = &folio->_nr_pages_mapped;
-	int nr = 0, nr_pmdmapped = 0;
+	int nr = 0, nr_pmdmapped = 0, nr_lgmapped = 0;
 	bool last;
 	enum node_stat_item idx;
 
@@ -1482,6 +1486,7 @@ void page_remove_rmap(struct page *page,
 		last = atomic_add_negative(-1, &page->_mapcount);
 		nr = last;
 		if (last && folio_test_large(folio)) {
+			nr_lgmapped = 1;
 			nr = atomic_dec_return_relaxed(mapped);
 			nr = (nr < COMPOUND_MAPPED);
 		}
@@ -1513,6 +1518,8 @@ void page_remove_rmap(struct page *page,
 			idx = NR_FILE_PMDMAPPED;
 		__lruvec_stat_mod_folio(folio, idx, -nr_pmdmapped);
 	}
+	if (nr_lgmapped && folio_test_anon(folio))
+		__lruvec_stat_mod_folio(folio, NR_ANON_THPS_PTEMAPPED, -nr_lgmapped);
 	if (nr) {
 		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
 		__lruvec_stat_mod_folio(folio, idx, -nr);
--- a/mm/show_mem.c~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/mm/show_mem.c
@@ -251,6 +251,7 @@ static void show_free_areas(unsigned int
 			" shmem_thp:%lukB"
 			" shmem_pmdmapped:%lukB"
 			" anon_thp:%lukB"
+			" anon_thp_pte:%lukB"
 #endif
 			" writeback_tmp:%lukB"
 			" kernel_stack:%lukB"
@@ -277,6 +278,7 @@ static void show_free_areas(unsigned int
 			K(node_page_state(pgdat, NR_SHMEM_THPS)),
 			K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)),
 			K(node_page_state(pgdat, NR_ANON_THPS)),
+			K(node_page_state(pgdat, NR_ANON_THPS_PTEMAPPED)),
 #endif
 			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
 			node_page_state(pgdat, NR_KERNEL_STACK_KB),
--- a/mm/vmstat.c~mm-thp-account-pte-mapped-anonymous-thp-usage
+++ a/mm/vmstat.c
@@ -1228,6 +1228,7 @@ const char * const vmstat_text[] = {
 	"nr_file_hugepages",
 	"nr_file_pmdmapped",
 	"nr_anon_transparent_hugepages",
+	"nr_anon_thp_pte",
 	"nr_vmscan_write",
 	"nr_vmscan_immediate_reclaim",
 	"nr_dirtied",
_

Patches currently in -mm which might be from ryan.roberts@xxxxxxx are

mm-hugetlb-add-huge-page-size-param-to-set_huge_pte_at.patch
arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries.patch
mm-allow-deferred-splitting-of-arbitrary-anon-large-folios.patch
mm-non-pmd-mappable-large-folios-for-folio_add_new_anon_rmap.patch
mm-thp-account-pte-mapped-anonymous-thp-usage.patch
mm-thp-introduce-anon_orders-and-anon_always_mask-sysfs-files.patch
mm-thp-extend-thp-to-allocate-anonymous-large-folios.patch
mm-thp-add-recommend-option-for-anon_orders.patch
arm64-mm-override-arch_wants_pte_order.patch
selftests-mm-cow-generalize-do_run_with_thp-helper.patch
selftests-mm-cow-add-tests-for-small-order-anon-thp.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux