> -----Original Message----- > From: Mike Kravetz [mailto:mike.kravetz@xxxxxxxxxx] > Sent: Wednesday, July 15, 2020 11:21 AM > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; > akpm@xxxxxxxxxxxxxxxxxxxx > Cc: x86@xxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > Linuxarm <linuxarm@xxxxxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > Roman Gushchin <guro@xxxxxx>; Catalin Marinas > <catalin.marinas@xxxxxxx>; Will Deacon <will@xxxxxxxxxx>; Thomas Gleixner > <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Borislav Petkov > <bp@xxxxxxxxx>; H.Peter Anvin <hpa@xxxxxxxxx>; Mike Rapoport > <rppt@xxxxxxxxxxxxx>; Anshuman Khandual > <anshuman.khandual@xxxxxxx>; Jonathan Cameron > <jonathan.cameron@xxxxxxxxxx> > Subject: Re: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory > > On 7/10/20 5:09 AM, Barry Song wrote: > > Online nodes are not necessarily memory containing nodes. Splitting > > huge_cma in online nodes can lead to inconsistent hugetlb_cma size > > with user setting. For example, for one system with 4 numa nodes and > > only one of them has memory, if users set hugetlb_cma to 4GB, it will > > split into four 1GB. So only the node with memory will get 1GB CMA. > > All other three nodes get nothing. That means the whole system gets > > only 1GB CMA while users ask for 4GB. > > > > Thus, it is more sensible to split hugetlb_cma in nodes with memory. > > For the above case, the only node with memory will reserve 4GB cma > > which is same with user setting in bootargs. In order to split cma > > in nodes with memory, hugetlb_cma_reserve() should scan over those > > nodes with N_MEMORY state rather than N_ONLINE state. That means > > the function should be called only after arch code has finished > > setting the N_MEMORY state of nodes. > > > > The problem is always there if N_ONLINE != N_MEMORY. It is a general > > problem to all platforms. But there is some trivial difference among > > different architectures. > > For example, for ARM64, before hugetlb_cma_reserve() is called, all > > nodes have got N_ONLINE state. So hugetlb will get inconsistent cma > > size when some online nodes have no memory. For x86 case, the problem > > is hidden because X86 happens to just set N_ONLINE on the nodes with > > memory when hugetlb_cma_reserve() is called. > > > > Anyway, this patch moves to scan N_MEMORY in hugetlb_cma_reserve() > > and lets both x86 and ARM64 call the function after N_MEMORY state > > is ready. It also documents the requirement in the definition of > > hugetlb_cma_reserve(). > > > > Cc: Roman Gushchin <guro@xxxxxx> > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > > Cc: Will Deacon <will@xxxxxxxxxx> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > > Cc: Borislav Petkov <bp@xxxxxxxxx> > > Cc: H. Peter Anvin <hpa@xxxxxxxxx> > > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > > Cc: Mike Rapoport <rppt@xxxxxxxxxxxxx> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx> > > Cc: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> > > Signed-off-by: Barry Song <song.bao.hua@xxxxxxxxxxxxx> > > I agree we should only be concerned with N_MEMORY nodes for the CMA > reservations. However, this patch got me thinking: > - Do we really have to initiate the CMA reservations from arch specific code? > - Can we move the call to reserve CMA a little later into hugetlb arch > independent code? > > I know the cma_declare_contiguous_nid() routine says it should be called > from arch specific code. However, unless I am missing something that seems > mostly about timing. > > What about a change like this on top of this patch? > > From 72b5b9a623f8711ad7f79f1a8f910906245f5d07 Mon Sep 17 00:00:00 > 2001 > From: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Date: Tue, 14 Jul 2020 15:54:46 -0700 > Subject: [PATCH] hugetlb: move cma allocation call to arch independent code > > Instead of calling hugetlb_cma_reserve() from arch specific code, > call from arch independent code when a gigantic page hstate is > created. This is late enough in the init process that all numa > memory information should be initialized. And, it is early enough > to still use early memory allocator. > > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > --- > arch/arm64/mm/init.c | 10 ---------- > arch/x86/kernel/setup.c | 9 --------- > mm/hugetlb.c | 8 +++++++- > 3 files changed, 7 insertions(+), 20 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 79806732f4b4..ff0ff584dde9 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -427,16 +427,6 @@ void __init bootmem_init(void) > sparse_init(); > zone_sizes_init(min, max); > > - /* > - * must be done after zone_sizes_init() which calls free_area_init() > - * that calls node_set_state() to initialize node_states[N_MEMORY] > - * because hugetlb_cma_reserve() will scan over nodes with N_MEMORY > - * state > - */ > -#ifdef CONFIG_ARM64_4K_PAGES > - hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); > -#endif > - > memblock_dump_all(); > } > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index a1a9712090ae..111c8467fafa 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -1177,15 +1177,6 @@ void __init setup_arch(char **cmdline_p) > > x86_init.paging.pagetable_init(); > > - /* > - * must be done after zone_sizes_init() which calls free_area_init() > - * that calls node_set_state() to initialize node_states[N_MEMORY] > - * because hugetlb_cma_reserve() will scan over nodes with N_MEMORY > - * state > - */ > - if (boot_cpu_has(X86_FEATURE_GBPAGES)) > - hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); > - > kasan_init(); > > /* > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f24acb3af741..a0007d1d12d2 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3273,6 +3273,9 @@ void __init hugetlb_add_hstate(unsigned int order) > snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", > huge_page_size(h)/1024); > > + if (order >= MAX_ORDER && hugetlb_cma_size) > + hugetlb_cma_reserve(order); Hello, Mike, I am not sure if it is necessarily correct as the order for gigantic pages is arch-dependent: https://lkml.org/lkml/2020/7/1/14 > + > parsed_hstate = h; > } > > @@ -5647,7 +5650,10 @@ void __init hugetlb_cma_reserve(int order) > unsigned long size, reserved, per_node; > int nid; > > - cma_reserve_called = true; > + if (cma_reserve_called) > + return; > + else > + cma_reserve_called = true; > > if (!hugetlb_cma_size) > return; Thanks Barry