On Fri, Apr 03, 2020 at 08:56:57AM -0700, Mike Kravetz wrote: > On 3/11/20 3:09 PM, Roman Gushchin wrote: > > At large scale rebooting servers in order to allocate gigantic hugepages > > is quite expensive and complex. At the same time keeping some constant > > percentage of memory in reserved hugepages even if the workload isn't > > using it is a big waste: not all workloads can benefit from using 1 GB > > pages. > > > > The following solution can solve the problem: > > 1) On boot time a dedicated cma area* is reserved. The size is passed > > as a kernel argument. > > 2) Run-time allocations of gigantic hugepages are performed using the > > cma allocator and the dedicated cma area > > > > In this case gigantic hugepages can be allocated successfully with a > > high probability, however the memory isn't completely wasted if nobody > > is using 1GB hugepages: it can be used for pagecache, anon memory, > > THPs, etc. > > > > * On a multi-node machine a per-node cma area is allocated on each node. > > Following gigantic hugetlb allocation are using the first available > > numa node if the mask isn't specified by a user. > > > > Usage: > > 1) configure the kernel to allocate a cma area for hugetlb allocations: > > pass hugetlb_cma=10G as a kernel argument > > > > 2) allocate hugetlb pages as usual, e.g. > > echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > > > If the option isn't enabled or the allocation of the cma area failed, > > the current behavior of the system is preserved. > > > > x86 and arm-64 are covered by this patch, other architectures can be > > trivially added later. > > > > v3: > > - added fallback to the existing allocation mechanism > > - added min/max checks > > - switched to MiB in debug output > > - removed percentage option > > - added arch-specific order argument to determine an alignment > > - added arm support > > - fixed the !CONFIG_HUGETLBFS build > > > > Thanks to Michal, Mike, Andreas and Rik for ideas and suggestions! > > > > v2: > > -fixed !CONFIG_CMA build, suggested by Andrew Morton > > > > Signed-off-by: Roman Gushchin <guro@xxxxxx> > > It is a bit difficult to keep track of all the followup patches. One > small issue below. I agree. There was a dozen of cleanups and fixes from several other people, so it's a bit messy now. I'll merge it all together (including documentation fixes proposed by you) and resend, as soon as I'll figure out the hugetlb/cma locking issue. > > > --- > > .../admin-guide/kernel-parameters.txt | 7 ++ > > arch/arm64/mm/init.c | 6 + > > arch/x86/kernel/setup.c | 4 + > > include/linux/hugetlb.h | 8 ++ > > mm/hugetlb.c | 116 ++++++++++++++++++ > > 5 files changed, 141 insertions(+) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > index 0c9894247015..9eb0df40643d 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -1452,6 +1452,13 @@ > > hpet_mmap= [X86, HPET_MMAP] Allow userspace to mmap HPET > > registers. Default set by CONFIG_HPET_MMAP_DEFAULT. > > > > + hugetlb_cma= [x86-64] The size of a cma area used for allocation > > + of gigantic hugepages. > > + Format: nn[KMGTPE] > > + > > + If enabled, boot-time allocation of gigantic hugepages > > + is skipped. > > + > > hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot. > > hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages. > > On x86-64 and powerpc, this option can be specified > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index b65dffdfb201..e42727e3568e 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -29,6 +29,7 @@ > > #include <linux/mm.h> > > #include <linux/kexec.h> > > #include <linux/crash_dump.h> > > +#include <linux/hugetlb.h> > > > > #include <asm/boot.h> > > #include <asm/fixmap.h> > > @@ -457,6 +458,11 @@ void __init arm64_memblock_init(void) > > high_memory = __va(memblock_end_of_DRAM() - 1) + 1; > > > > dma_contiguous_reserve(arm64_dma32_phys_limit); > > + > > +#ifdef CONFIG_ARM64_4K_PAGES > > + hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); > > +#endif > > + > > } > > The documentation is already 'out of date' as you added support for arm64. > Not a huge deal as documentation rarely keeps up with code, but we should > at least be correct here. > > I have a patch series in progress which cleans up existing hugetlb command > line processing. > https://lore.kernel.org/linux-mm/20200401183819.20647-1-mike.kravetz@xxxxxxxxxx/ > > No need to make any changes here, but assuming this support goes in first > I would make the following changes as part of my series: > - Don't list architectures in Documentation. Just say support is arch > dependent. > - Introduce some mechanism to print an error if hugetlb_cma is specified > on the command line, but not supported by architecture. IIUC, no message > is printed today. IMO, this only becomes important if the documentation > does not list supported architectures. > > Not insisting that documentation be updated to include arm64. > Acked-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Thank you! Roman