On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 19/07/2024 01:12, Barry Song wrote: > > On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >> > >> Expose 3 new mTHP stats for file (pagecache) folio allocations: > >> > >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc > >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback > >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge > >> > >> This will provide some insight on the sizes of large folios being > >> allocated for file-backed memory, and how often allocation is failing. > >> > >> All non-order-0 (and most order-0) folio allocations are currently done > >> through filemap_alloc_folio(), and folios are charged in a subsequent > >> call to filemap_add_folio(). So count file_fallback when allocation > >> fails in filemap_alloc_folio() and count file_alloc or > >> file_fallback_charge in filemap_add_folio(), based on whether charging > >> succeeded or not. There are some users of filemap_add_folio() that > >> allocate their own order-0 folio by other means, so we would not count > >> an allocation failure in this case, but we also don't care about order-0 > >> allocations. This approach feels like it should be good enough and > >> doesn't require any (impractically large) refactoring. > >> > >> The existing mTHP stats interface is reused to provide consistency to > >> users. And because we are reusing the same interface, we can reuse the > >> same infrastructure on the kernel side. > >> > >> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> > >> --- > >> Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++ > >> include/linux/huge_mm.h | 3 +++ > >> include/linux/pagemap.h | 16 ++++++++++++++-- > >> mm/filemap.c | 6 ++++-- > >> mm/huge_memory.c | 7 +++++++ > >> 5 files changed, 41 insertions(+), 4 deletions(-) > >> > >> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > >> index 058485daf186..d4857e457add 100644 > >> --- a/Documentation/admin-guide/mm/transhuge.rst > >> +++ b/Documentation/admin-guide/mm/transhuge.rst > >> @@ -512,6 +512,19 @@ shmem_fallback_charge > >> falls back to using small pages even though the allocation was > >> successful. > >> > >> +file_alloc > >> + is incremented every time a file huge page is successfully > >> + allocated. > >> + > >> +file_fallback > >> + is incremented if a file huge page is attempted to be allocated > >> + but fails and instead falls back to using small pages. > >> + > >> +file_fallback_charge > >> + is incremented if a file huge page cannot be charged and instead > >> + falls back to using small pages even though the allocation was > >> + successful. > >> + > > > > I realized that when we talk about fallback, it doesn't necessarily mean > > small pages; it could also refer to smaller huge pages. > > Yes good point, I'll update the documentation to reflect that for all memory types. > > > > > anon_fault_alloc > > is incremented every time a huge page is successfully > > allocated and charged to handle a page fault. > > > > anon_fault_fallback > > is incremented if a page fault fails to allocate or charge > > a huge page and instead falls back to using huge pages with > > lower orders or small pages. > > > > anon_fault_fallback_charge > > is incremented if a page fault fails to charge a huge page and > > instead falls back to using huge pages with lower orders or > > small pages even though the allocation was successful. > > > > This also applies to files, right? > > It does in the place you highlight below, but page_cache_ra_order() just falls > back immediately to order-0. Regardless, I think we should just document the > user facing docs to allow for a lower high order. That gives the implementation > more flexibility. > > > > > do { > > gfp_t alloc_gfp = gfp; > > > > err = -ENOMEM; > > if (order > 0) > > alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; > > folio = filemap_alloc_folio(alloc_gfp, order); > > if (!folio) > > continue; > > > > /* Init accessed so avoid atomic > > mark_page_accessed later */ > > if (fgp_flags & FGP_ACCESSED) > > __folio_set_referenced(folio); > > > > err = filemap_add_folio(mapping, folio, index, gfp); > > if (!err) > > break; > > folio_put(folio); > > folio = NULL; > > } while (order-- > 0); > > > > > >> split > >> is incremented every time a huge page is successfully split into > >> smaller orders. This can happen for a variety of reasons but a > >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >> index b8c63c3e967f..4f9109fcdded 100644 > >> --- a/include/linux/huge_mm.h > >> +++ b/include/linux/huge_mm.h > >> @@ -123,6 +123,9 @@ enum mthp_stat_item { > >> MTHP_STAT_SHMEM_ALLOC, > >> MTHP_STAT_SHMEM_FALLBACK, > >> MTHP_STAT_SHMEM_FALLBACK_CHARGE, > >> + MTHP_STAT_FILE_ALLOC, > >> + MTHP_STAT_FILE_FALLBACK, > >> + MTHP_STAT_FILE_FALLBACK_CHARGE, > >> MTHP_STAT_SPLIT, > >> MTHP_STAT_SPLIT_FAILED, > >> MTHP_STAT_SPLIT_DEFERRED, > >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > >> index 6e2f72d03176..95a147b5d117 100644 > >> --- a/include/linux/pagemap.h > >> +++ b/include/linux/pagemap.h > >> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page) > >> } > >> > >> #ifdef CONFIG_NUMA > >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); > >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); > >> #else > >> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> { > >> return folio_alloc_noprof(gfp, order); > >> } > >> #endif > >> > >> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> +{ > >> + struct folio *folio; > >> + > >> + folio = __filemap_alloc_folio_noprof(gfp, order); > >> + > >> + if (!folio) > >> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); > >> + > >> + return folio; > >> +} > > > > Do we need to add and export __filemap_alloc_folio_noprof()? > > It is exported. See the below change in filemap.c. Previously > filemap_alloc_folio_noprof() was exported, but that is now an inline and > __filemap_alloc_folio_noprof() is exported in its place. > > > In any case, > > we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and > > will only allocate the folio instead? > > Sorry I don't understand what you mean by this? Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead to situations where people call __filemap_alloc_folio_noprof() and forget to call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems that counting is always necessary? Another option is that we still call count mthp within filemap_alloc_folio_noprof() and make it noinline if __filemap_alloc_folio_noprof() is not inline? > > > > >> + > >> #define filemap_alloc_folio(...) \ > >> alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) > >> > >> diff --git a/mm/filemap.c b/mm/filemap.c > >> index 53d5d0410b51..131d514fca29 100644 > >> --- a/mm/filemap.c > >> +++ b/mm/filemap.c > >> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, > >> int ret; > >> > >> ret = mem_cgroup_charge(folio, NULL, gfp); > >> + count_mthp_stat(folio_order(folio), > >> + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC); > >> if (ret) > >> return ret; > > > > Would the following be better? > > > > ret = mem_cgroup_charge(folio, NULL, gfp); > > if (ret) { > > count_mthp_stat(folio_order(folio), > > MTHP_STAT_FILE_FALLBACK_CHARGE); > > return ret; > > } > > count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); > > > > Anyway, it's up to you. The code just feels a bit off to me :-) > > Yes, agree your version is better. I also noticed that anon and shmem increment > FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same > semantics here. > > Thanks, > Ryan > > > > > >> > >> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, > >> EXPORT_SYMBOL_GPL(filemap_add_folio); > >> > >> #ifdef CONFIG_NUMA > >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> { > >> int n; > >> struct folio *folio; > >> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> } > >> return folio_alloc_noprof(gfp, order); > >> } > >> -EXPORT_SYMBOL(filemap_alloc_folio_noprof); > >> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); > >> #endif > >> > >> /* > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index 578ac212c172..26d558e3e80f 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = { > >> .attrs = anon_stats_attrs, > >> }; > >> > >> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); > >> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); > >> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); > >> + > >> static struct attribute *file_stats_attrs[] = { > >> + &file_alloc_attr.attr, > >> + &file_fallback_attr.attr, > >> + &file_fallback_charge_attr.attr, > >> #ifdef CONFIG_SHMEM > >> &shmem_alloc_attr.attr, > >> &shmem_fallback_attr.attr, > >> -- > >> 2.43.0 > >> > > Thanks Barry