On 05/04/2024 05:01, Barry Song wrote: > On Fri, Apr 5, 2024 at 3:57 PM Barry Song <21cnbao@xxxxxxxxx> wrote: >> >> On Fri, Apr 5, 2024 at 4:31 AM David Hildenbrand <david@xxxxxxxxxx> wrote: >>> >>> On 04.04.24 09:21, Ryan Roberts wrote: >>>> On 03/04/2024 22:00, Barry Song wrote: >>>>> On Thu, Apr 4, 2024 at 12:48 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >>>>>> >>>>>> On 03/04/2024 09:22, David Hildenbrand wrote: >>>>>>> On 03.04.24 05:55, Barry Song wrote: >>>>>>>> From: Barry Song <v-songbaohua@xxxxxxxx> >>>>>>>> >>>>>>>> Profiling a system blindly with mTHP has become challenging due >>>>>>>> to the lack of visibility into its operations. Presenting the >>>>>>>> success rate of mTHP allocations appears to be pressing need. >>>>>>>> >>>>>>>> Recently, I've been experiencing significant difficulty debugging >>>>>>>> performance improvements and regressions without these figures. >>>>>>>> It's crucial for us to understand the true effectiveness of >>>>>>>> mTHP in real-world scenarios, especially in systems with >>>>>>>> fragmented memory. >>>>>>>> >>>>>>>> This patch sets up the framework for per-order mTHP counters, >>>>>>>> starting with the introduction of anon_alloc_success and >>>>>>>> anon_alloc_fail counters. Incorporating additional counters >>>>>>>> should now be straightforward as well. >>>>>>>> >>>>>>>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> >>>>>>>> --- >>>>>>>> -v3: >>>>>>>> * save some memory as order-0 and order-1 can't be THP, Ryan; >>>>>>>> * rename to anon_alloc as right now we only support anon to address >>>>>>>> David's comment; >>>>>>>> * drop a redundant "else", Ryan >>>>>>>> >>>>>>>> include/linux/huge_mm.h | 18 ++++++++++++++ >>>>>>>> mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++++ >>>>>>>> mm/memory.c | 2 ++ >>>>>>>> 3 files changed, 74 insertions(+) >>>>>>>> >>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>>>>>> index e896ca4760f6..5e9af6be9537 100644 >>>>>>>> --- a/include/linux/huge_mm.h >>>>>>>> +++ b/include/linux/huge_mm.h >>>>>>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled_attr; >>>>>>>> * (which is a limitation of the THP implementation). >>>>>>>> */ >>>>>>>> #define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1))) >>>>>>>> +#define THP_MIN_ORDER 2 >>>>>>>> /* >>>>>>>> * Mask of all large folio orders supported for file THP. >>>>>>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(struct >>>>>>>> vm_area_struct *vma, >>>>>>>> enforce_sysfs, orders); >>>>>>>> } >>>>>>>> +enum thp_event_item { >>>>>>>> + THP_ANON_ALLOC_SUCCESS, >>>>>>>> + THP_ANON_ALLOC_FAIL, >>>>>>>> + NR_THP_EVENT_ITEMS >>>>>>>> +}; >>>>>>> >>>>>>> Maybe use a prefix that resembles matches the enum name and is "obviously" >>>>>>> different to the ones in vm_event_item.h, like >>>>>>> >>>>>>> enum thp_event { >>>>>>> THP_EVENT_ANON_ALLOC_SUCCESS, >>>>>>> THP_EVENT_ANON_ALLOC_FAIL, >>>>>>> __THP_EVENT_COUNT, >>>>>>> }; >>>>>> >>>>>> FWIW, I'd personally replace "event" with "stat". For me "event" only ever >>>>>> increments, but "stat" can increment and decrement. An event is a type of stat. >>>>>> >>>>>> You are only adding events for now, but we have identified a need for inc/dec >>>>>> stats that will be added in future. >>>>> >>>>> What about the below? >>>>> >>>>> enum thp_stat { >> >> It seems we still need to use enum thp_stat_item rather than thp_stat. >> This follows >> enum zone_stat_item >> enum numa_stat_item >> enum node_stat_item >> >> And most importantly, the below looks much better >> >> enum thp_stat_item { >> THP_STAT_ANON_ALLOC, >> THP_STAT_ANON_ALLOC_FALLBACK, >> __THP_STAT_COUNT >> }; >> >> struct thp_state { >> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; >> }; >> >> DECLARE_PER_CPU(struct thp_state, thp_states); >> >> than >> >> enum thp_stat { >> THP_STAT_ANON_ALLOC, >> THP_STAT_ANON_ALLOC_FALLBACK, >> __THP_STAT_COUNT >> }; >> >> struct thp_state { >> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; >> }; >> >>>>> THP_EVENT_ANON_ALLOC, >>>>> THP_EVENT_ANON_ALLOC_FALLBACK, >>>>> THP_EVENT_SWPOUT, >>>>> THP_EVENT_SWPOUT_FALLBACK, >>>>> ... >>>>> THP_NR_ANON_PAGES, >>>>> THP_NR_FILE_PAGES, >>>> >>>> I find this ambiguous; Is it the number of THPs or the number of base pages? >>>> >>>> I think David made the point about incorporating the enum name into the labels >>>> too, so that there can be no namespace confusion. How about: >>>> >>>> <enum>_<type>_<name> >>>> >>>> So: >>>> >>>> THP_STAT_EV_ANON_ALLOC >>>> THP_STAT_EV_ANON_ALLOC_FALLBACK >>>> THP_STAT_EV_ANON_PARTIAL >>>> THP_STAT_EV_SWPOUT >>>> THP_STAT_EV_SWPOUT_FALLBACK >>>> ... >>>> THP_STAT_NR_ANON >>>> THP_STAT_NR_FILE >>>> ... >>>> __THP_STAT_COUNT >>> >>> I'd even drop the "EV". "NR_ANON" vs "ANON_ALLOC" etc. is expressive enough. >> >> ok. > > Hi David, Ryan, > > I've named everything as follows. Please let me know if you have any further > suggestions before I send the updated version :-) Please treat all my comments below as optional - I'm just stating my preference! > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index e896ca4760f6..cc13fa14aa32 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct > vm_area_struct *vma, > enforce_sysfs, orders); > } > > +enum thp_stat_item { > + THP_STAT_ANON_ALLOC, > + THP_STAT_ANON_ALLOC_FALLBACK, > + __THP_STAT_COUNT > +}; > + > +struct thp_state { > + unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT]; Why using "state" here? I think "stats" is more appropriate? (as in short for "statistics") i.e. `struct thp_stats` and `unsigned long stats[][]`. > +}; > + > +DECLARE_PER_CPU(struct thp_state, thp_states); > + > +static inline void count_thp_state(int order, enum thp_stat_item item) > +{ > + this_cpu_inc(thp_states.state[order][item]); > +} > + > #define transparent_hugepage_use_zero_page() \ > (transparent_hugepage_flags & \ > (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG)) > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9d4b2fbf6872..e704b4408181 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -526,6 +526,46 @@ static const struct kobj_type thpsize_ktype = { > .sysfs_ops = &kobj_sysfs_ops, > }; > > +DEFINE_PER_CPU(struct thp_state, thp_states) = {{{0}}}; Does this need to be explicitly zeroed? Won't that be the default initial state, just like for regular globals? Perhaps PER_CPU is special? > + > +static unsigned long sum_thp_states(int order, enum thp_stat_item item) Again, I'd call it sum_thp_stats(). > +{ > + unsigned long sum = 0; > + int cpu; > + > + for_each_online_cpu(cpu) { > + struct thp_state *this = &per_cpu(thp_states, cpu); > + > + sum += this->state[order][item]; > + } > + > + return sum; > +} > + > +#define THP_STATE_ATTR(_name, _index) \ And THP_STATS_ATTR(); they are going to live in the "stats" directory after all. > +static ssize_t _name##_show(struct kobject *kobj, \ > + struct kobj_attribute *attr, char *buf) \ > +{ \ > + int order = to_thpsize(kobj)->order; \ > + \ > + return sysfs_emit(buf, "%lu\n", sum_thp_states(order, _index)); \ > +} \ > +static struct kobj_attribute _name##_attr = __ATTR_RO(_name) > + > +THP_STATE_ATTR(anon_alloc, THP_STAT_ANON_ALLOC); > +THP_STATE_ATTR(anon_alloc_fallback, THP_STAT_ANON_ALLOC_FALLBACK); > >> >>> >>> -- >>> Cheers, >>> >>> David / dhildenb > > Thanks > Barry