On Thu, Apr 4, 2024 at 8:21 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 03/04/2024 22:00, Barry Song wrote: > > On Thu, Apr 4, 2024 at 12:48 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >> > >> On 03/04/2024 09:22, David Hildenbrand wrote: > >>> On 03.04.24 05:55, Barry Song wrote: > >>>> From: Barry Song <v-songbaohua@xxxxxxxx> > >>>> > >>>> Profiling a system blindly with mTHP has become challenging due > >>>> to the lack of visibility into its operations. Presenting the > >>>> success rate of mTHP allocations appears to be pressing need. > >>>> > >>>> Recently, I've been experiencing significant difficulty debugging > >>>> performance improvements and regressions without these figures. > >>>> It's crucial for us to understand the true effectiveness of > >>>> mTHP in real-world scenarios, especially in systems with > >>>> fragmented memory. > >>>> > >>>> This patch sets up the framework for per-order mTHP counters, > >>>> starting with the introduction of anon_alloc_success and > >>>> anon_alloc_fail counters. Incorporating additional counters > >>>> should now be straightforward as well. > >>>> > >>>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > >>>> --- > >>>> -v3: > >>>> * save some memory as order-0 and order-1 can't be THP, Ryan; > >>>> * rename to anon_alloc as right now we only support anon to address > >>>> David's comment; > >>>> * drop a redundant "else", Ryan > >>>> > >>>> include/linux/huge_mm.h | 18 ++++++++++++++ > >>>> mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++++ > >>>> mm/memory.c | 2 ++ > >>>> 3 files changed, 74 insertions(+) > >>>> > >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>> index e896ca4760f6..5e9af6be9537 100644 > >>>> --- a/include/linux/huge_mm.h > >>>> +++ b/include/linux/huge_mm.h > >>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled_attr; > >>>> * (which is a limitation of the THP implementation). > >>>> */ > >>>> #define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1))) > >>>> +#define THP_MIN_ORDER 2 > >>>> /* > >>>> * Mask of all large folio orders supported for file THP. > >>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(struct > >>>> vm_area_struct *vma, > >>>> enforce_sysfs, orders); > >>>> } > >>>> +enum thp_event_item { > >>>> + THP_ANON_ALLOC_SUCCESS, > >>>> + THP_ANON_ALLOC_FAIL, > >>>> + NR_THP_EVENT_ITEMS > >>>> +}; > >>> > >>> Maybe use a prefix that resembles matches the enum name and is "obviously" > >>> different to the ones in vm_event_item.h, like > >>> > >>> enum thp_event { > >>> THP_EVENT_ANON_ALLOC_SUCCESS, > >>> THP_EVENT_ANON_ALLOC_FAIL, > >>> __THP_EVENT_COUNT, > >>> }; > >> > >> FWIW, I'd personally replace "event" with "stat". For me "event" only ever > >> increments, but "stat" can increment and decrement. An event is a type of stat. > >> > >> You are only adding events for now, but we have identified a need for inc/dec > >> stats that will be added in future. > > > > What about the below? > > > > enum thp_stat { > > THP_EVENT_ANON_ALLOC, > > THP_EVENT_ANON_ALLOC_FALLBACK, > > THP_EVENT_SWPOUT, > > THP_EVENT_SWPOUT_FALLBACK, > > ... > > THP_NR_ANON_PAGES, > > THP_NR_FILE_PAGES, > > I find this ambiguous; Is it the number of THPs or the number of base pages? Indeed. I've noticed that the kernel seems rather unpredictable in terms of both the number of THPs and the number of base pages. for example: __node_stat_mod_folio(folio, NR_FILE_PAGES, nr); __node_stat_mod_folio(folio, NR_FILE_PAGES, -nr); add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); The previous statements pertain to base pages encountered when dealing with a large folio.; but the below means THP number: __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped); My feeling is that when assessing the true memory size, base pages are the primary consideration. However, I haven't addressed this aspect yet, and we can defer this discussion until I delve into it further upon arrival :-) Currently, I'm exclusively employing it as a demonstration to differentiate between "event" and "stat", aiming to provide a glimpse into the future perspective. > > I think David made the point about incorporating the enum name into the labels > too, so that there can be no namespace confusion. How about: > > <enum>_<type>_<name> > > So: > > THP_STAT_EV_ANON_ALLOC > THP_STAT_EV_ANON_ALLOC_FALLBACK > THP_STAT_EV_ANON_PARTIAL > THP_STAT_EV_SWPOUT > THP_STAT_EV_SWPOUT_FALLBACK > ... > THP_STAT_NR_ANON > THP_STAT_NR_FILE > ... > __THP_STAT_COUNT > > I'm not sure if we need to prefix SWPOUT with ANON? (might we additionally want > SHMEM in future? > > I'm aware I'm bikeshedding at this point... sorry :) No problem at all. I'm open to discussing it further so that I can better prepare for version 4. > > > > ... > > __THP_STAT_COUNT, > > }; > > > > THP_EVENT can only increase; THP_NR_ANON/FILE_PAGES can increase or > > decrease. Thus, their names have no "EVENT". > > Thanks Barry