On Mon, Jul 1, 2024 at 4:16 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 30/06/2024 12:34, Lance Yang wrote: > > Hi Barry, > > > > Thanks for following up! > > > > On Sun, Jun 30, 2024 at 5:48 PM Barry Song <baohua@xxxxxxxxxx> wrote: > >> > >> On Thu, Apr 25, 2024 at 3:41 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >>> > >>> + Barry > >>> > >>> On 24/04/2024 14:51, Lance Yang wrote: > >>>> At present, the split counters in THP statistics no longer include > >>>> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split > >>>> counters to monitor the frequency of mTHP splits. This will assist > >>>> developers in better analyzing and optimizing system performance. > >>>> > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats > >>>> split_page > >>>> split_page_failed > >>>> deferred_split_page > >>>> > >>>> Signed-off-by: Lance Yang <ioworker0@xxxxxxxxx> > >>>> --- > >>>> include/linux/huge_mm.h | 3 +++ > >>>> mm/huge_memory.c | 14 ++++++++++++-- > >>>> 2 files changed, 15 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>> index 56c7ea73090b..7b9c6590e1f7 100644 > >>>> --- a/include/linux/huge_mm.h > >>>> +++ b/include/linux/huge_mm.h > >>>> @@ -272,6 +272,9 @@ enum mthp_stat_item { > >>>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, > >>>> MTHP_STAT_ANON_SWPOUT, > >>>> MTHP_STAT_ANON_SWPOUT_FALLBACK, > >>>> + MTHP_STAT_SPLIT_PAGE, > >>>> + MTHP_STAT_SPLIT_PAGE_FAILED, > >>>> + MTHP_STAT_DEFERRED_SPLIT_PAGE, > >>>> __MTHP_STAT_COUNT > >>>> }; > >>>> > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 055df5aac7c3..52db888e47a6 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); > >>>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); > >>>> DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT); > >>>> DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK); > >>>> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE); > >>>> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED); > >>>> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE); > >>>> > >>>> static struct attribute *stats_attrs[] = { > >>>> &anon_fault_alloc_attr.attr, > >>>> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = { > >>>> &anon_fault_fallback_charge_attr.attr, > >>>> &anon_swpout_attr.attr, > >>>> &anon_swpout_fallback_attr.attr, > >>>> + &split_page_attr.attr, > >>>> + &split_page_failed_attr.attr, > >>>> + &deferred_split_page_attr.attr, > >>>> NULL, > >>>> }; > >>>> > >>>> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > >>>> XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order); > >>>> struct anon_vma *anon_vma = NULL; > >>>> struct address_space *mapping = NULL; > >>>> - bool is_thp = folio_test_pmd_mappable(folio); > >>>> + int order = folio_order(folio); > >>>> int extra_pins, ret; > >>>> pgoff_t end; > >>>> bool is_hzp; > >>>> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > >>>> i_mmap_unlock_read(mapping); > >>>> out: > >>>> xas_destroy(&xas); > >>>> - if (is_thp) > >>>> + if (order >= HPAGE_PMD_ORDER) > >>>> count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED); > >>>> + count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE : > >>>> + MTHP_STAT_SPLIT_PAGE_FAILED); > >>>> return ret; > >>>> } > >>>> > >>>> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio) > >>>> if (list_empty(&folio->_deferred_list)) { > >>>> if (folio_test_pmd_mappable(folio)) > >>>> count_vm_event(THP_DEFERRED_SPLIT_PAGE); > >>>> + count_mthp_stat(folio_order(folio), > >>>> + MTHP_STAT_DEFERRED_SPLIT_PAGE); > >>> > >>> There is a very long conversation with Barry about adding a 'global "mTHP became > >>> partially mapped 1 or more processes" counter (inc only)', which terminates at > >>> [1]. There is a lot of discussion about the required semantics around the need > >>> for partial map to cover alignment and contiguity as well as whether all pages > >>> are mapped, and to trigger once it becomes partial in at least 1 process. > >>> > >>> MTHP_STAT_DEFERRED_SPLIT_PAGE is giving much simpler semantics, but less > >>> information as a result. Barry, what's your view here? I'm guessing this doesn't > >>> quite solve what you are looking for? > >> > >> This doesn't quite solve what I am looking for but I still think the > >> patch has its value. > >> > >> I'm looking for a solution that can: > >> > >> * Count the amount of memory in the system for each mTHP size. > >> * Determine how much memory for each mTHP size is partially unmapped. > >> > >> For example, in a system with 16GB of memory, we might find that we have 3GB > >> of 64KB mTHP, and within that, 512MB is partially unmapped, potentially wasting > >> memory at this moment. I'm uncertain whether Lance is interested in > >> this job :-) > > > > Nice, that's an interesting/valuable job for me ;) > > > > Let's do it separately, as 'split' and friends probably can’t be the > > solution you > > mentioned above, IMHO. > > > > Hmm... I don't have a good idea about the solution for now, but will > > think it over > > and come back to discuss it here. > > I have a grad starting in a couple of weeks and I had been planning to initially > ask him to look at this to help him get up to speed on mTHP/mm stuff. But I have > plenty of other things for him to do if Lance wants to take this :) I'm very happy to do that, but it doesn't have to be just me - anyone with a better idea can take it on ;) Thanks, Lance > > > > >> > >> Counting deferred_split remains valuable as it can signal whether the system is > >> experiencing significant partial unmapping. > > > > Have a nice weekend! > > Lance > > > >> > >>> > >>> [1] https://lore.kernel.org/linux-mm/6cc7d781-884f-4d8f-a175-8609732b87eb@xxxxxxx/ > >>> > >>> Thanks, > >>> Ryan > >>> > >>>> list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); > >>>> ds_queue->split_queue_len++; > >>>> #ifdef CONFIG_MEMCG > >>> > >> > >> Thanks > >> Barry >