On Fri, Aug 9, 2024 at 4:23 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 08/08/2024 02:04, Barry Song wrote: > > From: Barry Song <v-songbaohua@xxxxxxxx> > > > > When an mTHP is added to the deferred_list, its partial pages > > are unused, leading to wasted memory and potentially increasing > > memory reclamation pressure. Tracking this number indicates > > the extent to which userspace is partially unmapping mTHPs. > > > > Detailing the specifics of how unmapping occurs is quite difficult > > and not that useful, so we adopt a simple approach: each time an > > mTHP enters the deferred_list, we increment the count by 1; whenever > > it leaves for any reason, we decrement the count by 1. > > > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > Documentation/admin-guide/mm/transhuge.rst | 5 +++++ > > include/linux/huge_mm.h | 1 + > > mm/huge_memory.c | 6 ++++++ > > 3 files changed, 12 insertions(+) > > > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > > index 715f181543f6..5028d61cbe0c 100644 > > --- a/Documentation/admin-guide/mm/transhuge.rst > > +++ b/Documentation/admin-guide/mm/transhuge.rst > > @@ -532,6 +532,11 @@ anon_num > > These huge pages could be still entirely mapped and have partially > > unmapped and unused subpages. > > > > +anon_num_partial_unused > > Why is the user-exposed name completely different to the internal > (MTHP_STAT_NR_ANON_SPLIT_DEFERRED) name? My point is that the user might not even know what a deferred split is; they are more concerned with whether there's any temporary memory waste or what the deferred list means from a user perspective. However, since we've referred to it as SPLIT_DEFERRED in other sys ABI, I agree with you that we should continue using that term. > > > + the number of anon huge pages which have been partially unmapped > > + we have in the whole system. These unmapped subpages are also > > + unused and temporarily wasting memory. > > + > > As the system ages, allocating huge pages may be expensive as the > > system uses memory compaction to copy data around memory to free a > > huge page for use. There are some counters in ``/proc/vmstat`` to help > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 294c348fe3cc..4b27a9797150 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -282,6 +282,7 @@ enum mthp_stat_item { > > MTHP_STAT_SPLIT_FAILED, > > MTHP_STAT_SPLIT_DEFERRED, > > MTHP_STAT_NR_ANON, > > + MTHP_STAT_NR_ANON_SPLIT_DEFERRED, > > So the existing MTHP_STAT_SPLIT_DEFERRED is counting all folios that were ever > put on the list, and the new MTHP_STAT_NR_ANON_SPLIT_DEFERRED is counting the > number of folios that are currently on the list? Yep. > > In which case, do we need the "ANON" in the name? It's implicit for the existing > split counters that they are anon-only. That would relate it more clearly to the > existing MTHP_STAT_SPLIT_DEFERRED too? ack. > > > __MTHP_STAT_COUNT > > }; > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index b6bc2a3791e3..6083144f9fa0 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -579,6 +579,7 @@ DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT); > > DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); > > DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); > > DEFINE_MTHP_STAT_ATTR(anon_num, MTHP_STAT_NR_ANON); > > +DEFINE_MTHP_STAT_ATTR(anon_num_partial_unused, MTHP_STAT_NR_ANON_SPLIT_DEFERRED); > > > > static struct attribute *stats_attrs[] = { > > &anon_fault_alloc_attr.attr, > > @@ -593,6 +594,7 @@ static struct attribute *stats_attrs[] = { > > &split_failed_attr.attr, > > &split_deferred_attr.attr, > > &anon_num_attr.attr, > > + &anon_num_partial_unused_attr.attr, > > NULL, > > }; > > > > @@ -3229,6 +3231,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > > if (folio_order(folio) > 1 && > > !list_empty(&folio->_deferred_list)) { > > ds_queue->split_queue_len--; > > + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_SPLIT_DEFERRED, -1); > > /* > > * Reinitialize page_deferred_list after removing the > > * page from the split_queue, otherwise a subsequent > > @@ -3291,6 +3294,7 @@ void __folio_undo_large_rmappable(struct folio *folio) > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > > if (!list_empty(&folio->_deferred_list)) { > > ds_queue->split_queue_len--; > > + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_SPLIT_DEFERRED, -1); > > list_del_init(&folio->_deferred_list); > > } > > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > @@ -3332,6 +3336,7 @@ void deferred_split_folio(struct folio *folio) > > if (folio_test_pmd_mappable(folio)) > > count_vm_event(THP_DEFERRED_SPLIT_PAGE); > > count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); > > + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_SPLIT_DEFERRED, 1); > > list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); > > ds_queue->split_queue_len++; > > #ifdef CONFIG_MEMCG > > @@ -3379,6 +3384,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > > list_move(&folio->_deferred_list, &list); > > } else { > > /* We lost race with folio_put() */ > > + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_SPLIT_DEFERRED, -1); > > list_del_init(&folio->_deferred_list); > > ds_queue->split_queue_len--; > > } > Thanks Barry