Hi Mike, On Tue, Jul 14, 2020 at 04:21:01PM -0700, Mike Kravetz wrote: > I agree we should only be concerned with N_MEMORY nodes for the CMA > reservations. However, this patch got me thinking: > - Do we really have to initiate the CMA reservations from arch specific code? > - Can we move the call to reserve CMA a little later into hugetlb arch > independent code? > > I know the cma_declare_contiguous_nid() routine says it should be called > from arch specific code. However, unless I am missing something that seems > mostly about timing. > > What about a change like this on top of this patch? > > From 72b5b9a623f8711ad7f79f1a8f910906245f5d07 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Date: Tue, 14 Jul 2020 15:54:46 -0700 > Subject: [PATCH] hugetlb: move cma allocation call to arch independent code > > Instead of calling hugetlb_cma_reserve() from arch specific code, > call from arch independent code when a gigantic page hstate is > created. This is late enough in the init process that all numa > memory information should be initialized. And, it is early enough > to still use early memory allocator. > > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > --- > arch/arm64/mm/init.c | 10 ---------- > arch/x86/kernel/setup.c | 9 --------- > mm/hugetlb.c | 8 +++++++- > 3 files changed, 7 insertions(+), 20 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 79806732f4b4..ff0ff584dde9 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -427,16 +427,6 @@ void __init bootmem_init(void) > sparse_init(); > zone_sizes_init(min, max); > > - /* > - * must be done after zone_sizes_init() which calls free_area_init() > - * that calls node_set_state() to initialize node_states[N_MEMORY] > - * because hugetlb_cma_reserve() will scan over nodes with N_MEMORY > - * state > - */ > -#ifdef CONFIG_ARM64_4K_PAGES > - hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); > -#endif > - > memblock_dump_all(); > } > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index a1a9712090ae..111c8467fafa 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -1177,15 +1177,6 @@ void __init setup_arch(char **cmdline_p) > > x86_init.paging.pagetable_init(); > > - /* > - * must be done after zone_sizes_init() which calls free_area_init() > - * that calls node_set_state() to initialize node_states[N_MEMORY] > - * because hugetlb_cma_reserve() will scan over nodes with N_MEMORY > - * state > - */ > - if (boot_cpu_has(X86_FEATURE_GBPAGES)) > - hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); > - > kasan_init(); > > /* > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f24acb3af741..a0007d1d12d2 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3273,6 +3273,9 @@ void __init hugetlb_add_hstate(unsigned int order) > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > huge_page_size(h)/1024); (nit: you can also make hugetlb_cma_reserve() static and remote its function prototypes from hugetlb.h) > + if (order >= MAX_ORDER && hugetlb_cma_size) > + hugetlb_cma_reserve(order); Although I really like the idea of moving this out of the arch code, I don't quite follow the check against MAX_ORDER here -- it looks like a bit of a hack to try to intercept the "PUD_SHIFT - PAGE_SHIFT" order which we currently pass to hugetlb_cma_reserve(). Maybe we could instead have something like: #ifndef HUGETLB_CMA_ORDER #define HUGETLB_CMA_ORDER (PUD_SHIFT - PAGE_SHIFT) #endif and then just do: if (order == HUGETLB_CMA_ORDER) hugetlb_cma_reserve(order); ? Is there something else I'm missing? > + > parsed_hstate = h; > } > > @@ -5647,7 +5650,10 @@ void __init hugetlb_cma_reserve(int order) > unsigned long size, reserved, per_node; > int nid; > > - cma_reserve_called = true; > + if (cma_reserve_called) > + return; > + else > + cma_reserve_called = true; (nit: don't need the 'else' here) Will