On 22.08.24 11:21, Barry Song wrote:
On Thu, Aug 22, 2024 at 8:59 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 22.08.24 10:44, Barry Song wrote:
On Thu, Aug 22, 2024 at 12:52 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
On Thu, Aug 22, 2024 at 5:34 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 12.08.24 00:49, Barry Song wrote:
From: Barry Song <v-songbaohua@xxxxxxxx>
Anon large folios come from three places:
1. new allocated large folios in PF, they will call folio_add_new_anon_rmap()
for rmap;
2. a large folio is split into multiple lower-order large folios;
3. a large folio is migrated to a new large folio.
In all above three counts, we increase nr_anon by 1;
Anon large folios might go either because of be split or be put
to free, in these cases, we reduce the count by 1.
Folios that have been added to the swap cache but have not yet received
an anon mapping won't be counted. This is consistent with the AnonPages
statistics in /proc/meminfo.
Thinking out loud, I wonder if we want to have something like that for
any anon folios (including small ones).
Assume we longterm-pinned an anon folio and unmapped/zapped it. It would
be quite interesting to see that these are actually anon pages still
consuming memory. Same with memory leaks, when an anon folio doesn't get
freed for some reason.
The whole "AnonPages" counter thingy is just confusing, it only counts
what's currently mapped ... so we'd want something different.
But it's okay to start with large folios only, there we have a new
interface without that legacy stuff :)
We have two options to do this:
1. add a new separate nr_anon_unmapped interface which
counts unmapped anon memory only
2. let the nr_anon count both mapped and unmapped anon
folios.
I would assume 1 is clearer as right now AnonPages have been
there for years. and counting all mapped and unmapped together,
we are still lacking an approach to find out anon memory leak
problem you mentioned.
We are right now comparing nr_anon(including mapped folios only)
with AnonPages to get the distribution of different folio sizes in
performance profiling.
unmapped_nr_anon should be normally always quite small. otherwise,
something must be wrong.
Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
---
Documentation/admin-guide/mm/transhuge.rst | 5 +++++
include/linux/huge_mm.h | 15 +++++++++++++--
mm/huge_memory.c | 13 ++++++++++---
mm/migrate.c | 4 ++++
mm/page_alloc.c | 5 ++++-
mm/rmap.c | 1 +
6 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 058485daf186..9fdfb46e4560 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -527,6 +527,11 @@ split_deferred
it would free up some memory. Pages on split queue are going to
be split under memory pressure, if splitting is possible.
+nr_anon
+ the number of anon huge pages we have in the whole system.
"transparent ..." otherwise people might confuse it with anon hugetlb
"huge pages" ... :)
I briefly tried coming up with a better name than "nr_anon" but failed.
if we might have unmapped_anon counter later, maybe rename it to
nr_anon_mapped? and the new interface we will have in the future
might be nr_anon_unmapped?
We really shouldn't be using the mapped/unmapped terminology here ... we
allocated pages and turned them into anonymous folios. At some point we
free them. That's the lifecycle.
On second thought, this might be incorrect as well. Concepts like 'anon',
'shmem', and 'file' refer to states after mapping. If an 'anon' has been
unmapped but is still pinned and not yet freed, it isn't technically an
'anon' anymore?
It's just not mapped, and cannot get mapped, anymore. In the memdesc
world, we'd be freeing the "struct anon" or "struct folio" once the last
refcount goes to 0, not once (e.g., temporarily during a failed
migration?) unmapped.
The important part to me would be: this is memory that was allocated for
anonymous memory, and it's still around for some reason and not getting
freed. Usually, we would expect anon memory to get freed fairly quickly
once unmapped. Except when there are long-term pinnings or other types
of memory leaks.
You could happily continue using these anon pages via vmsplice() or
similar, even thought he original page table mapping was torn down.
On the other hand, implementing nr_anon_unmapped could be extremely
tricky. I have no idea how to implement it as we are losing those mapping
flags.
folio_mapcount() can tell you efficiently whether a folio is mapped or
not -- and that information will stay for all eternity as long as we
have any mapcounts :) . It cannot tell "how many" pages of a large folio
are mapped, but at least "is any page of this large folio mapped".
Exactly. AnonPages decreases by -1 when removed from the rmap,
whereas nr_anon decreases by -1 when an anon folio is freed. So,
I would assume nr_anon includes those pinned and unmapped anon
folios but AnonPages doesn't.
Right, note how internally it is called "NR_ANON_MAPPED", but we ended
up calling it "AnonPages". But that's rather a legacy interface we
cannot change (fix) that easily. At least not without a config option.
At some point it might indeed be interesting to have "nr_anon_mapped",
here, but that would correspond to "is any part of this large folio
mapped". For debugging purposes in the future, that might be indeed
interesting.
"nr_anon": anon allocations (until freed -> +1)
"nr_anon_mapped": anon allocations that are mapped (any part mapped -> +1)
"nr_anon_partially_mapped": anon allocations that was detected to be
partially mapped at some point -> +1
If a folio is in the swapcache, I would still want to see that it is an
anon allocation lurking around in the system. Like we do with pagecache
pages. *There* we do have the difference between "allocated" and
"mapped" already.
So likely, calling it "nr_anon" here, and tracking it on an allocation
level, is good enough for now and future proof.
If there's a significant amount of 'leaked' anon, we should consider
having a separate counter for them. For instance, if nr_anon is
100,000 and pinned/unmapped pages account for 50%, then nr_anon
alone doesn’t effectively reflect the system's state.
Right, but if you stare at the system you could tell that a significant
amount of memory is still getting consumed through existing/previous
anon mappings. Depends on how valuable that distinction really is.
--
Cheers,
David / dhildenb