On Tue, Jan 28, 2025 at 9:52 AM Frank van der Linden <fvdl@xxxxxxxxxx> wrote: > > Hi Christophe, thanks for your comments. Replies inline below. > > On Tue, Jan 28, 2025 at 12:55 AM Christophe Leroy > <christophe.leroy@xxxxxxxxxx> wrote: > > > > > > > > Le 28/01/2025 à 00:22, Frank van der Linden a écrit : > > > If hugetlb_cma_only is enabled, we know that hugetlb pages > > > can only be allocated from CMA. Now that there is an interface > > > to do early reservations from a CMA area (returning memblock > > > memory), it can be used to allocate hugetlb pages from CMA. > > > > > > This also allows for doing pre-HVO on these pages (if enabled). > > > > > > Make sure to initialize the page structures and associated data > > > correctly. Create a flag to signal that a hugetlb page has been > > > allocated from CMA to make things a little easier. > > > > > > Some configurations of powerpc have a special hugetlb bootmem > > > allocator, so introduce a boolean arch_specific_huge_bootmem_alloc > > > that returns true if such an allocator is present. In that case, > > > CMA bootmem allocations can't be used, so check that function > > > before trying. > > > > > > Cc: Madhavan Srinivasan <maddy@xxxxxxxxxxxxx> > > > Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> > > > Cc: linuxppc-dev@xxxxxxxxxxxxxxxx > > > Signed-off-by: Frank van der Linden <fvdl@xxxxxxxxxx> > > > --- > > > arch/powerpc/mm/hugetlbpage.c | 5 ++ > > > include/linux/hugetlb.h | 7 ++ > > > mm/hugetlb.c | 135 +++++++++++++++++++++++++--------- > > > 3 files changed, 114 insertions(+), 33 deletions(-) > > > > > > diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c > > > index d3c1b749dcfc..e53e4b4c8ef6 100644 > > > --- a/arch/powerpc/mm/hugetlbpage.c > > > +++ b/arch/powerpc/mm/hugetlbpage.c > > > @@ -121,6 +121,11 @@ bool __init hugetlb_node_alloc_supported(void) > > > { > > > return false; > > > } > > > + > > > +bool __init arch_specific_huge_bootmem_alloc(struct hstate *h) > > > +{ > > > + return (firmware_has_feature(FW_FEATURE_LPAR) && !radix_enabled()); > > > +} > > > #endif > > > > > > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > > index 2512463bca49..bca3052fb175 100644 > > > --- a/include/linux/hugetlb.h > > > +++ b/include/linux/hugetlb.h > > > @@ -591,6 +591,7 @@ enum hugetlb_page_flags { > > > HPG_freed, > > > HPG_vmemmap_optimized, > > > HPG_raw_hwp_unreliable, > > > + HPG_cma, > > > __NR_HPAGEFLAGS, > > > }; > > > > > > @@ -650,6 +651,7 @@ HPAGEFLAG(Temporary, temporary) > > > HPAGEFLAG(Freed, freed) > > > HPAGEFLAG(VmemmapOptimized, vmemmap_optimized) > > > HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable) > > > +HPAGEFLAG(Cma, cma) > > > > > > #ifdef CONFIG_HUGETLB_PAGE > > > > > > @@ -678,14 +680,18 @@ struct hstate { > > > char name[HSTATE_NAME_LEN]; > > > }; > > > > > > +struct cma; > > > + > > > struct huge_bootmem_page { > > > struct list_head list; > > > struct hstate *hstate; > > > unsigned long flags; > > > + struct cma *cma; > > > }; > > > > > > #define HUGE_BOOTMEM_HVO 0x0001 > > > #define HUGE_BOOTMEM_ZONES_VALID 0x0002 > > > +#define HUGE_BOOTMEM_CMA 0x0004 > > > > > > bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); > > > > > > @@ -711,6 +717,7 @@ bool __init hugetlb_node_alloc_supported(void); > > > > > > void __init hugetlb_add_hstate(unsigned order); > > > bool __init arch_hugetlb_valid_size(unsigned long size); > > > +bool __init arch_specific_huge_bootmem_alloc(struct hstate *h); > > > > Why adding 'specific' in the name ? Prefixing a function name with arch_ > > is usually enough to denote an architecture specific function. > > True, yes. That should probably be arch_has_huge_bootmem_alloc, I will > change that. > > > > > > struct hstate *size_to_hstate(unsigned long size); > > > > > > #ifndef HUGE_MAX_HSTATE > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 32ebde9039e2..183e8d0c2fb4 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -61,7 +61,7 @@ static struct cma *hugetlb_cma[MAX_NUMNODES]; > > > static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata; > > > #endif > > > static bool hugetlb_cma_only; > > > -static unsigned long hugetlb_cma_size __initdata; > > > +static unsigned long hugetlb_cma_size; > > > > Why remove __initdata ? As far as I can see it is used only in > > hugetlb_early_cma() and hugetlb_hstate_alloc_pages() which are __init > > functions. > > hugetlb_cma_size is now used in alloc_gigantic_folio(), which is not > an __init function. However, you got me thinking: since > hugetlb_cma_only is only effective when hugetlb_cma_size != 0, I can > just reset hugetlb_cma_only to false if hugetlb_cma_size == 0 after > parsing the commandline arguments. This will revert hugetlb_cma_size > to __initdata, and simplifies things a bit. I'll make that change in > v2. > > > > > > > > __initdata struct list_head huge_boot_pages[MAX_NUMNODES]; > > > __initdata unsigned long hstate_boot_nrinvalid[HUGE_MAX_HSTATE]; > > > @@ -132,8 +132,10 @@ static void hugetlb_free_folio(struct folio *folio) > > > #ifdef CONFIG_CMA > > > int nid = folio_nid(folio); > > > > > > - if (cma_free_folio(hugetlb_cma[nid], folio)) > > > + if (folio_test_hugetlb_cma(folio)) { > > > + WARN_ON(!cma_free_folio(hugetlb_cma[nid], folio)); > > > > Is that WARN_ON() needed ? See > > https://docs.kernel.org/process/coding-style.html#do-not-crash-the-kernel > > Not strictly, I suppose, but if there is a CMA-allocated hugetlb > folio, and cma_free fails, that would be a condition worthy of a > warning, as the flag somehow got corrupted or there is an internal CMA > error. How about WARN_ON_ONCE? > > > > > > > > return; > > > + } > > > #endif > > > folio_put(folio); > > > } > > > @@ -1509,6 +1511,9 @@ static struct folio *alloc_gigantic_folio(struct hstate *h, gfp_t gfp_mask, > > > break; > > > } > > > } > > > + > > > + if (folio) > > > + folio_set_hugetlb_cma(folio); > > > } > > > #endif > > > if (!folio) { > > > @@ -3175,6 +3180,63 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > > > return ERR_PTR(-ENOSPC); > > > } > > > > > > +/* > > > + * Some architectures do their own bootmem allocation, so they can't use > > > + * early CMA allocation. So, allow for this function to be redefined. > > > + */ > > > +bool __init __attribute((weak)) > > > > Can't you use __weak ? > > > > By the way, do we really need a weak function here ? Can't it be a > > static inline helper that gets waived by a macro defined by the arch, > > something like: > > > > #ifndef arch_huge_bootmem_alloc > > static inline arch_huge_bootmem_alloc(struct hstate *h) > > { > > return false; > > } > > #endif > > > > Then powerpc does: > > > > #define arch_huge_bootmem_alloc arch_huge_bootmem_alloc > > static inline arch_huge_bootmem_alloc(struct hstate *h) > > { > > return (firmware_has_feature(FW_FEATURE_LPAR) && !radix_enabled()); > > } > > > > Fair enough, yeah. I used a weak symbol because that was already used > for powerpc with alloc_bootmem_huge_page(), but I can change this one. > > > > > But why is struct hstate *h parameter needed ? Seems like noone uses it. > > Correct - I merely extrapolated a bit and thought "well, architectures > might have bootmem hugetlb allocators that only deal with certain > sizes". But then again, like you say, there is currently no need for > it. I'll remove the argument. > > > > > > +arch_specific_huge_bootmem_alloc(struct hstate *h) > > > +{ > > > + return false; > > > +} > > > + > > > +static bool __init hugetlb_early_cma(struct hstate *h) > > > +{ > > > + if (arch_specific_huge_bootmem_alloc(h)) > > > + return false; > > > + > > > + return (hstate_is_gigantic(h) && hugetlb_cma_size && hugetlb_cma_only); > > > +} > > > + > > > +static __init void *alloc_bootmem(struct hstate *h, int nid) > > > +{ > > > + struct huge_bootmem_page *m; > > > + unsigned long flags; > > > + struct cma *cma; > > > + > > > +#ifdef CONFIG_CMA > > > > #ifdefs in C files should be avoided, see > > https://docs.kernel.org/process/coding-style.html#conditional-compilation > > > > > + if (hugetlb_early_cma(h)) { > > > + flags = HUGE_BOOTMEM_CMA; > > > + cma = hugetlb_cma[nid]; > > > + m = cma_reserve_early(cma, huge_page_size(h)); > > > + } else > > > +#endif > > > > This kind of if/else construct in uggly, should be avoided. > > > > I found this ifdef hard to avoid, sadly, I tried various ways to avoid > it (If (IS_ENABLED(CONFIG_CMA), etc), but came up short. I'll have > another look for v2, but short of trying to split off all CMA-related > code in to a different file, which would definitely be out of scope > here, it might not end up being better. I ended up moving the hugetlb_cma code to its own file anyway, which is a new patch at the end of the v2 series I just sent. - Frank