On Fri, Aug 16, 2024 at 9:23 AM Zi Yan <ziy@xxxxxxxxxx> wrote: > > On 13 Aug 2024, at 23:54, Yu Zhao wrote: > > > Use __GFP_COMP for gigantic folios to greatly reduce not only the > > amount of code but also the allocation and free time. > > > > LOC (approximately): +60, -240 > > > > Allocate and free 500 1GB hugeTLB memory without HVO by: > > time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > > > Before After > > Alloc ~13s ~10s > > Free ~15s <1s > > > > The above magnitude generally holds for multiple x86 and arm64 CPU > > models. > > > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > > Reported-by: Frank van der Linden <fvdl@xxxxxxxxxx> > > --- > > include/linux/hugetlb.h | 9 +- > > mm/hugetlb.c | 293 ++++++++-------------------------------- > > 2 files changed, 62 insertions(+), 240 deletions(-) > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index 3100a52ceb73..98c47c394b89 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -896,10 +896,11 @@ static inline bool hugepage_movable_supported(struct hstate *h) > > /* Movability of hugepages depends on migration support. */ > > static inline gfp_t htlb_alloc_mask(struct hstate *h) > > { > > - if (hugepage_movable_supported(h)) > > - return GFP_HIGHUSER_MOVABLE; > > - else > > - return GFP_HIGHUSER; > > + gfp_t gfp = __GFP_COMP | __GFP_NOWARN; > > + > > + gfp |= hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER; > > + > > + return gfp; > > } > > > > static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 71d469c8e711..efa77ce87dcc 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -56,16 +56,6 @@ struct hstate hstates[HUGE_MAX_HSTATE]; > > #ifdef CONFIG_CMA > > static struct cma *hugetlb_cma[MAX_NUMNODES]; > > static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata; > > -static bool hugetlb_cma_folio(struct folio *folio, unsigned int order) > > -{ > > - return cma_pages_valid(hugetlb_cma[folio_nid(folio)], &folio->page, > > - 1 << order); > > -} > > -#else > > -static bool hugetlb_cma_folio(struct folio *folio, unsigned int order) > > -{ > > - return false; > > -} > > #endif > > static unsigned long hugetlb_cma_size __initdata; > > > > @@ -100,6 +90,17 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma, > > unsigned long start, unsigned long end); > > static struct resv_map *vma_resv_map(struct vm_area_struct *vma); > > > > +static void hugetlb_free_folio(struct folio *folio) > > +{ > > +#ifdef CONFIG_CMA > > + int nid = folio_nid(folio); > > + > > + if (cma_free_folio(hugetlb_cma[nid], folio)) > > + return; > > +#endif > > + folio_put(folio); > > +} > > + > > It seems that we no longer use free_contig_range() to free gigantic > folios from alloc_contig_range(). We switched to two pairs of extern (to the allocator) APIs in this patch: folio_alloc_gigantic() folio_put() and cma_alloc_folio() cma_free_folio() > Will it work? Or did I miss anything? alloc_contig_range and free_contig_range() also works with __GFP_COMP / large folios, but this pair is internal (to the allocator) and shouldn't be used directly except to implement external APIs like above.