Hi David, On Mon, Jul 1, 2024 at 4:56 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 30.06.24 11:48, Barry Song wrote: > > On Thu, Apr 25, 2024 at 3:41 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >> > >> + Barry > >> > >> On 24/04/2024 14:51, Lance Yang wrote: > >>> At present, the split counters in THP statistics no longer include > >>> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split > >>> counters to monitor the frequency of mTHP splits. This will assist > >>> developers in better analyzing and optimizing system performance. > >>> > >>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats > >>> split_page > >>> split_page_failed > >>> deferred_split_page > >>> > >>> Signed-off-by: Lance Yang <ioworker0@xxxxxxxxx> > >>> --- > >>> include/linux/huge_mm.h | 3 +++ > >>> mm/huge_memory.c | 14 ++++++++++++-- > >>> 2 files changed, 15 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>> index 56c7ea73090b..7b9c6590e1f7 100644 > >>> --- a/include/linux/huge_mm.h > >>> +++ b/include/linux/huge_mm.h > >>> @@ -272,6 +272,9 @@ enum mthp_stat_item { > >>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, > >>> MTHP_STAT_ANON_SWPOUT, > >>> MTHP_STAT_ANON_SWPOUT_FALLBACK, > >>> + MTHP_STAT_SPLIT_PAGE, > >>> + MTHP_STAT_SPLIT_PAGE_FAILED, > >>> + MTHP_STAT_DEFERRED_SPLIT_PAGE, > >>> __MTHP_STAT_COUNT > >>> }; > >>> > >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>> index 055df5aac7c3..52db888e47a6 100644 > >>> --- a/mm/huge_memory.c > >>> +++ b/mm/huge_memory.c > >>> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); > >>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); > >>> DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT); > >>> DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK); > >>> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE); > >>> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED); > >>> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE); > >>> > >>> static struct attribute *stats_attrs[] = { > >>> &anon_fault_alloc_attr.attr, > >>> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = { > >>> &anon_fault_fallback_charge_attr.attr, > >>> &anon_swpout_attr.attr, > >>> &anon_swpout_fallback_attr.attr, > >>> + &split_page_attr.attr, > >>> + &split_page_failed_attr.attr, > >>> + &deferred_split_page_attr.attr, > >>> NULL, > >>> }; > >>> > >>> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > >>> XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order); > >>> struct anon_vma *anon_vma = NULL; > >>> struct address_space *mapping = NULL; > >>> - bool is_thp = folio_test_pmd_mappable(folio); > >>> + int order = folio_order(folio); > >>> int extra_pins, ret; > >>> pgoff_t end; > >>> bool is_hzp; > >>> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > >>> i_mmap_unlock_read(mapping); > >>> out: > >>> xas_destroy(&xas); > >>> - if (is_thp) > >>> + if (order >= HPAGE_PMD_ORDER) > >>> count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED); > >>> + count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE : > >>> + MTHP_STAT_SPLIT_PAGE_FAILED); > >>> return ret; > >>> } > >>> > >>> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio) > >>> if (list_empty(&folio->_deferred_list)) { > >>> if (folio_test_pmd_mappable(folio)) > >>> count_vm_event(THP_DEFERRED_SPLIT_PAGE); > >>> + count_mthp_stat(folio_order(folio), > >>> + MTHP_STAT_DEFERRED_SPLIT_PAGE); > >> > >> There is a very long conversation with Barry about adding a 'global "mTHP became > >> partially mapped 1 or more processes" counter (inc only)', which terminates at > >> [1]. There is a lot of discussion about the required semantics around the need > >> for partial map to cover alignment and contiguity as well as whether all pages > >> are mapped, and to trigger once it becomes partial in at least 1 process. > >> > >> MTHP_STAT_DEFERRED_SPLIT_PAGE is giving much simpler semantics, but less > >> information as a result. Barry, what's your view here? I'm guessing this doesn't > >> quite solve what you are looking for? > > > > This doesn't quite solve what I am looking for but I still think the > > patch has its value. > > > > I'm looking for a solution that can: > > > > * Count the amount of memory in the system for each mTHP size. > > * Determine how much memory for each mTHP size is partially unmapped. > > > > For example, in a system with 16GB of memory, we might find that we have 3GB > > of 64KB mTHP, and within that, 512MB is partially unmapped, potentially wasting > > memory at this moment. I'm uncertain whether Lance is interested in > > this job :-) > > > > Counting deferred_split remains valuable as it can signal whether the system is > > experiencing significant partial unmapping. > > I'll note that, especially without subpage mapcounts, in the future we > won't have that information (how much is currently mapped) readily > available in all cases. To obtain that information on demand, we'd have > to scan page tables or walk the rmap. Thanks for pointing that out! > > Something to keep in mind: we don't want to introduce counters that will > be expensive to maintain longterm. I'll keep that in mind as we move forward with any new implementations. Thanks, Lance > > -- > Cheers, > > David / dhildenb >