On Thu, Aug 8, 2024 at 8:17 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 08.08.24 10:08, David Hildenbrand wrote: > > On 08.08.24 10:03, David Hildenbrand wrote: > >> On 08.08.24 09:08, Barry Song wrote: > >>> On Thu, Aug 8, 2024 at 1:05 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > >>>> > >>>> From: Barry Song <v-songbaohua@xxxxxxxx> > >>>> > >>>> When a new anonymous mTHP is added to the rmap, we increase the count. > >>>> We reduce the count whenever an mTHP is completely unmapped. > >>>> > >>>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > >>>> --- > >>>> Documentation/admin-guide/mm/transhuge.rst | 5 +++++ > >>>> include/linux/huge_mm.h | 15 +++++++++++++-- > >>>> mm/huge_memory.c | 2 ++ > >>>> mm/rmap.c | 3 +++ > >>>> 4 files changed, 23 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > >>>> index 058485daf186..715f181543f6 100644 > >>>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>>> @@ -527,6 +527,11 @@ split_deferred > >>>> it would free up some memory. Pages on split queue are going to > >>>> be split under memory pressure, if splitting is possible. > >>>> > >>>> +anon_num > >>>> + the number of anon huge pages we have in the whole system. > >>>> + These huge pages could be still entirely mapped and have partially > >>>> + unmapped and unused subpages. > >>>> + > >>>> As the system ages, allocating huge pages may be expensive as the > >>>> system uses memory compaction to copy data around memory to free a > >>>> huge page for use. There are some counters in ``/proc/vmstat`` to help > >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>> index e25d9ebfdf89..294c348fe3cc 100644 > >>>> --- a/include/linux/huge_mm.h > >>>> +++ b/include/linux/huge_mm.h > >>>> @@ -281,6 +281,7 @@ enum mthp_stat_item { > >>>> MTHP_STAT_SPLIT, > >>>> MTHP_STAT_SPLIT_FAILED, > >>>> MTHP_STAT_SPLIT_DEFERRED, > >>>> + MTHP_STAT_NR_ANON, > >>>> __MTHP_STAT_COUNT > >>>> }; > >>>> > >>>> @@ -291,14 +292,24 @@ struct mthp_stat { > >>>> #ifdef CONFIG_SYSFS > >>>> DECLARE_PER_CPU(struct mthp_stat, mthp_stats); > >>>> > >>>> -static inline void count_mthp_stat(int order, enum mthp_stat_item item) > >>>> +static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int delta) > >>>> { > >>>> if (order <= 0 || order > PMD_ORDER) > >>>> return; > >>>> > >>>> - this_cpu_inc(mthp_stats.stats[order][item]); > >>>> + this_cpu_add(mthp_stats.stats[order][item], delta); > >>>> +} > >>>> + > >>>> +static inline void count_mthp_stat(int order, enum mthp_stat_item item) > >>>> +{ > >>>> + mod_mthp_stat(order, item, 1); > >>>> } > >>>> + > >>>> #else > >>>> +static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int delta) > >>>> +{ > >>>> +} > >>>> + > >>>> static inline void count_mthp_stat(int order, enum mthp_stat_item item) > >>>> { > >>>> } > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 697fcf89f975..b6bc2a3791e3 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -578,6 +578,7 @@ DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE); > >>>> DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT); > >>>> DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); > >>>> DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); > >>>> +DEFINE_MTHP_STAT_ATTR(anon_num, MTHP_STAT_NR_ANON); > >>>> > >>>> static struct attribute *stats_attrs[] = { > >>>> &anon_fault_alloc_attr.attr, > >>>> @@ -591,6 +592,7 @@ static struct attribute *stats_attrs[] = { > >>>> &split_attr.attr, > >>>> &split_failed_attr.attr, > >>>> &split_deferred_attr.attr, > >>>> + &anon_num_attr.attr, > >>>> NULL, > >>>> }; > >>>> > >>>> diff --git a/mm/rmap.c b/mm/rmap.c > >>>> index 901950200957..2b722f26224c 100644 > >>>> --- a/mm/rmap.c > >>>> +++ b/mm/rmap.c > >>>> @@ -1467,6 +1467,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, > >>>> } > >>>> > >>>> __folio_mod_stat(folio, nr, nr_pmdmapped); > >>>> + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); > >>>> } > >>>> > >>>> static __always_inline void __folio_add_file_rmap(struct folio *folio, > >>>> @@ -1582,6 +1583,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > >>>> list_empty(&folio->_deferred_list)) > >>>> deferred_split_folio(folio); > >>>> __folio_mod_stat(folio, -nr, -nr_pmdmapped); > >>>> + if (folio_test_anon(folio) && !atomic_read(mapped)) > >>> > >>> could have a risk here two processes unmap at the same time, so > >>> they both get zero on atomic_read(mapped)? should read the value > >>> of atomic_dec_return() instead to confirm we are the last one > >>> doing unmap? > >> > >> I would appreciate if we leave the rmap out here. > >> > >> Can't we handle that when actually freeing the folio? folio_test_anon() > >> is sticky until freed. > > > > To be clearer: we increment the counter when we set a folio anon, which > > should indeed only happen in folio_add_new_anon_rmap(). We'll have to > > ignore hugetlb here where we do it in hugetlb_add_new_anon_rmap(). > > > > Then, when we free an anon folio we decrement the counter. (hugetlb > > should clear the anon flag when an anon folio gets freed back to its > > allocator -- likely that is already done). > > > > Sorry that I am talking to myself: I'm wondering if we also have to > adjust the counter when splitting a large folio to multiple > smaller-but-still-large folios. yes, if we don't use remove_rmap. because we could allocate them as mTHP but free them as nr_pages small folios. > > -- > Cheers, > > David / dhildenb >