Re: [PATCH v3] mm: add per-order mTHP alloc_success and alloc_fail counters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 5, 2024 at 8:18 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
>
> On 05/04/2024 05:01, Barry Song wrote:
> > On Fri, Apr 5, 2024 at 3:57 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
> >>
> >> On Fri, Apr 5, 2024 at 4:31 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>>
> >>> On 04.04.24 09:21, Ryan Roberts wrote:
> >>>> On 03/04/2024 22:00, Barry Song wrote:
> >>>>> On Thu, Apr 4, 2024 at 12:48 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> >>>>>>
> >>>>>> On 03/04/2024 09:22, David Hildenbrand wrote:
> >>>>>>> On 03.04.24 05:55, Barry Song wrote:
> >>>>>>>> From: Barry Song <v-songbaohua@xxxxxxxx>
> >>>>>>>>
> >>>>>>>> Profiling a system blindly with mTHP has become challenging due
> >>>>>>>> to the lack of visibility into its operations. Presenting the
> >>>>>>>> success rate of mTHP allocations appears to be pressing need.
> >>>>>>>>
> >>>>>>>> Recently, I've been experiencing significant difficulty debugging
> >>>>>>>> performance improvements and regressions without these figures.
> >>>>>>>> It's crucial for us to understand the true effectiveness of
> >>>>>>>> mTHP in real-world scenarios, especially in systems with
> >>>>>>>> fragmented memory.
> >>>>>>>>
> >>>>>>>> This patch sets up the framework for per-order mTHP counters,
> >>>>>>>> starting with the introduction of anon_alloc_success and
> >>>>>>>> anon_alloc_fail counters.  Incorporating additional counters
> >>>>>>>> should now be straightforward as well.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
> >>>>>>>> ---
> >>>>>>>>    -v3:
> >>>>>>>>    * save some memory as order-0 and order-1 can't be THP, Ryan;
> >>>>>>>>    * rename to anon_alloc as right now we only support anon to address
> >>>>>>>>      David's comment;
> >>>>>>>>    * drop a redundant "else", Ryan
> >>>>>>>>
> >>>>>>>>    include/linux/huge_mm.h | 18 ++++++++++++++
> >>>>>>>>    mm/huge_memory.c        | 54 +++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>    mm/memory.c             |  2 ++
> >>>>>>>>    3 files changed, 74 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >>>>>>>> index e896ca4760f6..5e9af6be9537 100644
> >>>>>>>> --- a/include/linux/huge_mm.h
> >>>>>>>> +++ b/include/linux/huge_mm.h
> >>>>>>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled_attr;
> >>>>>>>>     * (which is a limitation of the THP implementation).
> >>>>>>>>     */
> >>>>>>>>    #define THP_ORDERS_ALL_ANON    ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1)))
> >>>>>>>> +#define THP_MIN_ORDER        2
> >>>>>>>>      /*
> >>>>>>>>     * Mask of all large folio orders supported for file THP.
> >>>>>>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(struct
> >>>>>>>> vm_area_struct *vma,
> >>>>>>>>                          enforce_sysfs, orders);
> >>>>>>>>    }
> >>>>>>>>    +enum thp_event_item {
> >>>>>>>> +    THP_ANON_ALLOC_SUCCESS,
> >>>>>>>> +    THP_ANON_ALLOC_FAIL,
> >>>>>>>> +    NR_THP_EVENT_ITEMS
> >>>>>>>> +};
> >>>>>>>
> >>>>>>> Maybe use a prefix that resembles matches the enum name and is "obviously"
> >>>>>>> different to the ones in vm_event_item.h, like
> >>>>>>>
> >>>>>>> enum thp_event {
> >>>>>>>      THP_EVENT_ANON_ALLOC_SUCCESS,
> >>>>>>>      THP_EVENT_ANON_ALLOC_FAIL,
> >>>>>>>      __THP_EVENT_COUNT,
> >>>>>>> };
> >>>>>>
> >>>>>> FWIW, I'd personally replace "event" with "stat". For me "event" only ever
> >>>>>> increments, but "stat" can increment and decrement. An event is a type of stat.
> >>>>>>
> >>>>>> You are only adding events for now, but we have identified a need for inc/dec
> >>>>>> stats that will be added in future.
> >>>>>
> >>>>> What about the below?
> >>>>>
> >>>>> enum thp_stat {
> >>
> >> It seems we still need to use enum thp_stat_item rather than thp_stat.
> >> This follows
> >> enum zone_stat_item
> >> enum numa_stat_item
> >> enum node_stat_item
> >>
> >> And most importantly, the below looks much better
> >>
> >>        enum thp_stat_item {
> >>               THP_STAT_ANON_ALLOC,
> >>               THP_STAT_ANON_ALLOC_FALLBACK,
> >>               __THP_STAT_COUNT
> >>        };
> >>
> >>        struct thp_state {
> >>               unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
> >>        };
> >>
> >>        DECLARE_PER_CPU(struct thp_state, thp_states);
> >>
> >> than
> >>
> >>        enum thp_stat {
> >>               THP_STAT_ANON_ALLOC,
> >>               THP_STAT_ANON_ALLOC_FALLBACK,
> >>               __THP_STAT_COUNT
> >>        };
> >>
> >>        struct thp_state {
> >>               unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
> >>        };
> >>
> >>>>>     THP_EVENT_ANON_ALLOC,
> >>>>>     THP_EVENT_ANON_ALLOC_FALLBACK,
> >>>>>     THP_EVENT_SWPOUT,
> >>>>>     THP_EVENT_SWPOUT_FALLBACK,
> >>>>>     ...
> >>>>>     THP_NR_ANON_PAGES,
> >>>>>     THP_NR_FILE_PAGES,
> >>>>
> >>>> I find this ambiguous; Is it the number of THPs or the number of base pages?
> >>>>
> >>>> I think David made the point about incorporating the enum name into the labels
> >>>> too, so that there can be no namespace confusion. How about:
> >>>>
> >>>> <enum>_<type>_<name>
> >>>>
> >>>> So:
> >>>>
> >>>> THP_STAT_EV_ANON_ALLOC
> >>>> THP_STAT_EV_ANON_ALLOC_FALLBACK
> >>>> THP_STAT_EV_ANON_PARTIAL
> >>>> THP_STAT_EV_SWPOUT
> >>>> THP_STAT_EV_SWPOUT_FALLBACK
> >>>> ...
> >>>> THP_STAT_NR_ANON
> >>>> THP_STAT_NR_FILE
> >>>> ...
> >>>> __THP_STAT_COUNT
> >>>
> >>> I'd even drop the "EV". "NR_ANON" vs "ANON_ALLOC" etc. is expressive enough.
> >>
> >> ok.
> >
> > Hi David, Ryan,
> >
> > I've named everything as follows. Please let me know if you have any further
> > suggestions before I send the updated version :-)
>
> Please treat all my comments below as optional - I'm just stating my preference!
>
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index e896ca4760f6..cc13fa14aa32 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct
> > vm_area_struct *vma,
> >                                           enforce_sysfs, orders);
> >  }
> >
> > +enum thp_stat_item {
> > +       THP_STAT_ANON_ALLOC,
> > +       THP_STAT_ANON_ALLOC_FALLBACK,
> > +       __THP_STAT_COUNT
> > +};
> > +
> > +struct thp_state {
> > +       unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
>
> Why using "state" here? I think "stats" is more appropriate? (as in short for
> "statistics") i.e. `struct thp_stats` and `unsigned long stats[][]`.

Sounds good.

>
> > +};
> > +
> > +DECLARE_PER_CPU(struct thp_state, thp_states);
> > +
> > +static inline void count_thp_state(int order, enum thp_stat_item item)
> > +{
> > +       this_cpu_inc(thp_states.state[order][item]);
> > +}
> > +
> >  #define transparent_hugepage_use_zero_page()                           \
> >         (transparent_hugepage_flags &                                   \
> >          (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 9d4b2fbf6872..e704b4408181 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -526,6 +526,46 @@ static const struct kobj_type thpsize_ktype = {
> >         .sysfs_ops = &kobj_sysfs_ops,
> >  };
> >
> > +DEFINE_PER_CPU(struct thp_state, thp_states) = {{{0}}};
>
> Does this need to be explicitly zeroed? Won't that be the default initial state,
> just like for regular globals? Perhaps PER_CPU is special?

I really don't know. uninitialized data is usually put in bss section so kernel
will memset them to 0. but uninitialized per-cpu data is still in .data..percpu
section and I can't find where memset is done in this area as this section
can also contain initialized per-cpu data. Will compilers fill zero for
uninitialized per-cpu data automatically so kernel just thinks they are
initialized 0 data?

anyway, I copied the code from vmstat
DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};

>
> > +
> > +static unsigned long sum_thp_states(int order, enum thp_stat_item item)
>
> Again, I'd call it sum_thp_stats().

sounds good.

>
> > +{
> > +       unsigned long sum = 0;
> > +       int cpu;
> > +
> > +       for_each_online_cpu(cpu) {
> > +               struct thp_state *this = &per_cpu(thp_states, cpu);
> > +
> > +               sum += this->state[order][item];
> > +       }
> > +
> > +       return sum;
> > +}
> > +
> > +#define THP_STATE_ATTR(_name, _index)                                  \
>
> And THP_STATS_ATTR(); they are going to live in the "stats" directory after all.

sounds good.

>
> > +static ssize_t _name##_show(struct kobject *kobj,                      \
> > +                       struct kobj_attribute *attr, char *buf)         \
> > +{                                                                      \
> > +       int order = to_thpsize(kobj)->order;                            \
> > +                                                                       \
> > +       return sysfs_emit(buf, "%lu\n", sum_thp_states(order, _index)); \
> > +}                                                                      \
> > +static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> > +
> > +THP_STATE_ATTR(anon_alloc, THP_STAT_ANON_ALLOC);
> > +THP_STATE_ATTR(anon_alloc_fallback, THP_STAT_ANON_ALLOC_FALLBACK);
> >
> >>
> >>>
> >>> --
> >>> Cheers,
> >>>
> >>> David / dhildenb
> >
> > Thanks
> > Barry
>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux