Re: [PATCH v2 3/3] mm: mTHP stats for pagecache folio allocations

Barry Song <baohua@xxxxxxxxxx> · Wed, 24 Jul 2024 10:07:21 +1200

On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
>
> On 19/07/2024 01:12, Barry Song wrote:
> > On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> >>
> >> Expose 3 new mTHP stats for file (pagecache) folio allocations:
> >>
> >>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
> >>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
> >>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
> >>
> >> This will provide some insight on the sizes of large folios being
> >> allocated for file-backed memory, and how often allocation is failing.
> >>
> >> All non-order-0 (and most order-0) folio allocations are currently done
> >> through filemap_alloc_folio(), and folios are charged in a subsequent
> >> call to filemap_add_folio(). So count file_fallback when allocation
> >> fails in filemap_alloc_folio() and count file_alloc or
> >> file_fallback_charge in filemap_add_folio(), based on whether charging
> >> succeeded or not. There are some users of filemap_add_folio() that
> >> allocate their own order-0 folio by other means, so we would not count
> >> an allocation failure in this case, but we also don't care about order-0
> >> allocations. This approach feels like it should be good enough and
> >> doesn't require any (impractically large) refactoring.
> >>
> >> The existing mTHP stats interface is reused to provide consistency to
> >> users. And because we are reusing the same interface, we can reuse the
> >> same infrastructure on the kernel side.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
> >> ---
> >>  Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
> >>  include/linux/huge_mm.h                    |  3 +++
> >>  include/linux/pagemap.h                    | 16 ++++++++++++++--
> >>  mm/filemap.c                               |  6 ++++--
> >>  mm/huge_memory.c                           |  7 +++++++
> >>  5 files changed, 41 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> >> index 058485daf186..d4857e457add 100644
> >> --- a/Documentation/admin-guide/mm/transhuge.rst
> >> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >> @@ -512,6 +512,19 @@ shmem_fallback_charge
> >>         falls back to using small pages even though the allocation was
> >>         successful.
> >>
> >> +file_alloc
> >> +       is incremented every time a file huge page is successfully
> >> +       allocated.
> >> +
> >> +file_fallback
> >> +       is incremented if a file huge page is attempted to be allocated
> >> +       but fails and instead falls back to using small pages.
> >> +
> >> +file_fallback_charge
> >> +       is incremented if a file huge page cannot be charged and instead
> >> +       falls back to using small pages even though the allocation was
> >> +       successful.
> >> +
> >
> > I realized that when we talk about fallback, it doesn't necessarily mean
> > small pages; it could also refer to smaller huge pages.
>
> Yes good point, I'll update the documentation to reflect that for all memory types.
>
> >
> > anon_fault_alloc
> >         is incremented every time a huge page is successfully
> >         allocated and charged to handle a page fault.
> >
> > anon_fault_fallback
> >         is incremented if a page fault fails to allocate or charge
> >         a huge page and instead falls back to using huge pages with
> >         lower orders or small pages.
> >
> > anon_fault_fallback_charge
> >         is incremented if a page fault fails to charge a huge page and
> >         instead falls back to using huge pages with lower orders or
> >         small pages even though the allocation was successful.
> >
> > This also applies to files, right?
>
> It does in the place you highlight below, but page_cache_ra_order() just falls
> back immediately to order-0. Regardless, I think we should just document the
> user facing docs to allow for a lower high order. That gives the implementation
> more flexibility.
>
> >
> >                 do {
> >                         gfp_t alloc_gfp = gfp;
> >
> >                         err = -ENOMEM;
> >                         if (order > 0)
> >                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> >                         folio = filemap_alloc_folio(alloc_gfp, order);
> >                         if (!folio)
> >                                 continue;
> >
> >                         /* Init accessed so avoid atomic
> > mark_page_accessed later */
> >                         if (fgp_flags & FGP_ACCESSED)
> >                                 __folio_set_referenced(folio);
> >
> >                         err = filemap_add_folio(mapping, folio, index, gfp);
> >                         if (!err)
> >                                 break;
> >                         folio_put(folio);
> >                         folio = NULL;
> >                 } while (order-- > 0);
> >
> >
> >>  split
> >>         is incremented every time a huge page is successfully split into
> >>         smaller orders. This can happen for a variety of reasons but a
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index b8c63c3e967f..4f9109fcdded 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -123,6 +123,9 @@ enum mthp_stat_item {
> >>         MTHP_STAT_SHMEM_ALLOC,
> >>         MTHP_STAT_SHMEM_FALLBACK,
> >>         MTHP_STAT_SHMEM_FALLBACK_CHARGE,
> >> +       MTHP_STAT_FILE_ALLOC,
> >> +       MTHP_STAT_FILE_FALLBACK,
> >> +       MTHP_STAT_FILE_FALLBACK_CHARGE,
> >>         MTHP_STAT_SPLIT,
> >>         MTHP_STAT_SPLIT_FAILED,
> >>         MTHP_STAT_SPLIT_DEFERRED,
> >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >> index 6e2f72d03176..95a147b5d117 100644
> >> --- a/include/linux/pagemap.h
> >> +++ b/include/linux/pagemap.h
> >> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
> >>  }
> >>
> >>  #ifdef CONFIG_NUMA
> >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> >>  #else
> >> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>  {
> >>         return folio_alloc_noprof(gfp, order);
> >>  }
> >>  #endif
> >>
> >> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >> +{
> >> +       struct folio *folio;
> >> +
> >> +       folio = __filemap_alloc_folio_noprof(gfp, order);
> >> +
> >> +       if (!folio)
> >> +               count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
> >> +
> >> +       return folio;
> >> +}
> >
> > Do we need to add and export __filemap_alloc_folio_noprof()?
>
> It is exported. See the below change in filemap.c. Previously
> filemap_alloc_folio_noprof() was exported, but that is now an inline and
> __filemap_alloc_folio_noprof() is exported in its place.
>
> > In any case,
> > we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
> > will only allocate the folio instead?
>
> Sorry I don't understand what you mean by this?

Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead
to situations where people call __filemap_alloc_folio_noprof() and
forget to call
count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems
that counting is always necessary? Another option is that we still
call count mthp
within filemap_alloc_folio_noprof() and make it noinline if
__filemap_alloc_folio_noprof()
is not inline?

>
> >
> >> +
> >>  #define filemap_alloc_folio(...)                               \
> >>         alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
> >>
> >> diff --git a/mm/filemap.c b/mm/filemap.c
> >> index 53d5d0410b51..131d514fca29 100644
> >> --- a/mm/filemap.c
> >> +++ b/mm/filemap.c
> >> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
> >>         int ret;
> >>
> >>         ret = mem_cgroup_charge(folio, NULL, gfp);
> >> +       count_mthp_stat(folio_order(folio),
> >> +               ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
> >>         if (ret)
> >>                 return ret;
> >
> > Would the following be better?
> >
> >         ret = mem_cgroup_charge(folio, NULL, gfp);
> >          if (ret) {
> >                  count_mthp_stat(folio_order(folio),
> > MTHP_STAT_FILE_FALLBACK_CHARGE);
> >                  return ret;
> >          }
> >        count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
> >
> > Anyway, it's up to you. The code just feels a bit off to me :-)
>
> Yes, agree your version is better. I also noticed that anon and shmem increment
> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
> semantics here.
>
> Thanks,
> Ryan
>
>
> >
> >>
> >> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
> >>  EXPORT_SYMBOL_GPL(filemap_add_folio);
> >>
> >>  #ifdef CONFIG_NUMA
> >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>  {
> >>         int n;
> >>         struct folio *folio;
> >> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>         }
> >>         return folio_alloc_noprof(gfp, order);
> >>  }
> >> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
> >> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
> >>  #endif
> >>
> >>  /*
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 578ac212c172..26d558e3e80f 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
> >>         .attrs = anon_stats_attrs,
> >>  };
> >>
> >> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> >> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> >> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> >> +
> >>  static struct attribute *file_stats_attrs[] = {
> >> +       &file_alloc_attr.attr,
> >> +       &file_fallback_attr.attr,
> >> +       &file_fallback_charge_attr.attr,
> >>  #ifdef CONFIG_SHMEM
> >>         &shmem_alloc_attr.attr,
> >>         &shmem_fallback_attr.attr,
> >> --
> >> 2.43.0
> >>
> >

Thanks
Barry