On Thu, Aug 22, 2024 at 10:01 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 22.08.24 11:21, Barry Song wrote: > > On Thu, Aug 22, 2024 at 8:59 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >> On 22.08.24 10:44, Barry Song wrote: > >>> On Thu, Aug 22, 2024 at 12:52 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > >>>> > >>>> On Thu, Aug 22, 2024 at 5:34 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > >>>>> > >>>>> On 12.08.24 00:49, Barry Song wrote: > >>>>>> From: Barry Song <v-songbaohua@xxxxxxxx> > >>>>>> > >>>>>> Anon large folios come from three places: > >>>>>> 1. new allocated large folios in PF, they will call folio_add_new_anon_rmap() > >>>>>> for rmap; > >>>>>> 2. a large folio is split into multiple lower-order large folios; > >>>>>> 3. a large folio is migrated to a new large folio. > >>>>>> > >>>>>> In all above three counts, we increase nr_anon by 1; > >>>>>> > >>>>>> Anon large folios might go either because of be split or be put > >>>>>> to free, in these cases, we reduce the count by 1. > >>>>>> > >>>>>> Folios that have been added to the swap cache but have not yet received > >>>>>> an anon mapping won't be counted. This is consistent with the AnonPages > >>>>>> statistics in /proc/meminfo. > >>>>> > >>>>> Thinking out loud, I wonder if we want to have something like that for > >>>>> any anon folios (including small ones). > >>>>> > >>>>> Assume we longterm-pinned an anon folio and unmapped/zapped it. It would > >>>>> be quite interesting to see that these are actually anon pages still > >>>>> consuming memory. Same with memory leaks, when an anon folio doesn't get > >>>>> freed for some reason. > >>>>> > >>>>> The whole "AnonPages" counter thingy is just confusing, it only counts > >>>>> what's currently mapped ... so we'd want something different. > >>>>> > >>>>> But it's okay to start with large folios only, there we have a new > >>>>> interface without that legacy stuff :) > >>>> > >>>> We have two options to do this: > >>>> 1. add a new separate nr_anon_unmapped interface which > >>>> counts unmapped anon memory only > >>>> 2. let the nr_anon count both mapped and unmapped anon > >>>> folios. > >>>> > >>>> I would assume 1 is clearer as right now AnonPages have been > >>>> there for years. and counting all mapped and unmapped together, > >>>> we are still lacking an approach to find out anon memory leak > >>>> problem you mentioned. > >>>> > >>>> We are right now comparing nr_anon(including mapped folios only) > >>>> with AnonPages to get the distribution of different folio sizes in > >>>> performance profiling. > >>>> > >>>> unmapped_nr_anon should be normally always quite small. otherwise, > >>>> something must be wrong. > >>>> > >>>>> > >>>>>> > >>>>>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > >>>>>> --- > >>>>>> Documentation/admin-guide/mm/transhuge.rst | 5 +++++ > >>>>>> include/linux/huge_mm.h | 15 +++++++++++++-- > >>>>>> mm/huge_memory.c | 13 ++++++++++--- > >>>>>> mm/migrate.c | 4 ++++ > >>>>>> mm/page_alloc.c | 5 ++++- > >>>>>> mm/rmap.c | 1 + > >>>>>> 6 files changed, 37 insertions(+), 6 deletions(-) > >>>>>> > >>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > >>>>>> index 058485daf186..9fdfb46e4560 100644 > >>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>>>>> @@ -527,6 +527,11 @@ split_deferred > >>>>>> it would free up some memory. Pages on split queue are going to > >>>>>> be split under memory pressure, if splitting is possible. > >>>>>> > >>>>>> +nr_anon > >>>>>> + the number of anon huge pages we have in the whole system. > >>>>> > >>>>> "transparent ..." otherwise people might confuse it with anon hugetlb > >>>>> "huge pages" ... :) > >>>>> > >>>>> I briefly tried coming up with a better name than "nr_anon" but failed. > >>>>> > >>>>> > >>>> > >>>> if we might have unmapped_anon counter later, maybe rename it to > >>>> nr_anon_mapped? and the new interface we will have in the future > >>>> might be nr_anon_unmapped? > >> > >> We really shouldn't be using the mapped/unmapped terminology here ... we > >> allocated pages and turned them into anonymous folios. At some point we > >> free them. That's the lifecycle. > >> > >>> > >>> On second thought, this might be incorrect as well. Concepts like 'anon', > >>> 'shmem', and 'file' refer to states after mapping. If an 'anon' has been > >>> unmapped but is still pinned and not yet freed, it isn't technically an > >>> 'anon' anymore? > >> > >> It's just not mapped, and cannot get mapped, anymore. In the memdesc > >> world, we'd be freeing the "struct anon" or "struct folio" once the last > >> refcount goes to 0, not once (e.g., temporarily during a failed > >> migration?) unmapped. > >> > >> The important part to me would be: this is memory that was allocated for > >> anonymous memory, and it's still around for some reason and not getting > >> freed. Usually, we would expect anon memory to get freed fairly quickly > >> once unmapped. Except when there are long-term pinnings or other types > >> of memory leaks. > >> > >> You could happily continue using these anon pages via vmsplice() or > >> similar, even thought he original page table mapping was torn down. > >> > >>> > >>> On the other hand, implementing nr_anon_unmapped could be extremely > >>> tricky. I have no idea how to implement it as we are losing those mapping > >>> flags. > >> > >> folio_mapcount() can tell you efficiently whether a folio is mapped or > >> not -- and that information will stay for all eternity as long as we > >> have any mapcounts :) . It cannot tell "how many" pages of a large folio > >> are mapped, but at least "is any page of this large folio mapped". > > > > Exactly. AnonPages decreases by -1 when removed from the rmap, > > whereas nr_anon decreases by -1 when an anon folio is freed. So, > > I would assume nr_anon includes those pinned and unmapped anon > > folios but AnonPages doesn't. > > Right, note how internally it is called "NR_ANON_MAPPED", but we ended > up calling it "AnonPages". But that's rather a legacy interface we > cannot change (fix) that easily. At least not without a config option. > > At some point it might indeed be interesting to have "nr_anon_mapped", > here, but that would correspond to "is any part of this large folio > mapped". For debugging purposes in the future, that might be indeed > interesting. > > "nr_anon": anon allocations (until freed -> +1) > "nr_anon_mapped": anon allocations that are mapped (any part mapped -> +1) > "nr_anon_partially_mapped": anon allocations that was detected to be > partially mapped at some point -> +1 > > If a folio is in the swapcache, I would still want to see that it is an > anon allocation lurking around in the system. Like we do with pagecache > pages. *There* we do have the difference between "allocated" and > "mapped" already. > > So likely, calling it "nr_anon" here, and tracking it on an allocation > level, is good enough for now and future proof. Right. I plan to send v3 tomorrow to at least unblock Usama's series, in case he wants to rebase on top of it. > > > > > If there's a significant amount of 'leaked' anon, we should consider > > having a separate counter for them. For instance, if nr_anon is > > 100,000 and pinned/unmapped pages account for 50%, then nr_anon > > alone doesn’t effectively reflect the system's state. > > Right, but if you stare at the system you could tell that a significant > amount of memory is still getting consumed through existing/previous > anon mappings. Depends on how valuable that distinction really is. > > -- > Cheers, > > David / dhildenb > Thanks Barry