On Thu 02-04-20 17:20:01, Vlastimil Babka wrote: [...] > FWIW, for review purposes, this is Roman's patch with all followups from > mmotm/next (hopefully didn't miss any) and then squashed with patch 2/2 from > this thread. It can be applied like this: > > - checkout v5.6 > - apply patch 1/2 from this thread > - apply below Thanks! > ----8<---- > >From dc10a593f2b8dfc7be920b4b088a8d55068fc6bc Mon Sep 17 00:00:00 2001 > From: Roman Gushchin <guro@xxxxxx> > Date: Thu, 2 Apr 2020 13:49:04 +1100 > Subject: [PATCH] mm: hugetlb: optionally allocate gigantic hugepages using cma > > Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at > runtime") has added the run-time allocation of gigantic pages. However it > actually works only at early stages of the system loading, when the > majority of memory is free. After some time the memory gets fragmented by > non-movable pages, so the chances to find a contiguous 1 GB block are > getting close to zero. Even dropping caches manually doesn't help a lot. > > At large scale rebooting servers in order to allocate gigantic hugepages > is quite expensive and complex. At the same time keeping some constant > percentage of memory in reserved hugepages even if the workload isn't > using it is a big waste: not all workloads can benefit from using 1 GB > pages. > > The following solution can solve the problem: > 1) On boot time a dedicated cma area* is reserved. The size is passed > as a kernel argument. > 2) Run-time allocations of gigantic hugepages are performed using the > cma allocator and the dedicated cma area > > In this case gigantic hugepages can be allocated successfully with a high > probability, however the memory isn't completely wasted if nobody is using > 1GB hugepages: it can be used for pagecache, anon memory, THPs, etc. > > * On a multi-node machine a per-node cma area is allocated on each node. > Following gigantic hugetlb allocation are using the first available > numa node if the mask isn't specified by a user. > > Usage: > 1) configure the kernel to allocate a cma area for hugetlb allocations: > pass hugetlb_cma=10G as a kernel argument > > 2) allocate hugetlb pages as usual, e.g. > echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > If the option isn't enabled or the allocation of the cma area failed, > the current behavior of the system is preserved. > > x86 and arm-64 are covered by this patch, other architectures can be > trivially added later. > > Link: http://lkml.kernel.org/r/20200311220920.2487528-1-guro@xxxxxx > Signed-off-by: Roman Gushchin <guro@xxxxxx> > Tested-by: Andreas Schaufler <andreas.schaufler@xxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxxx> > Cc: Andreas Schaufler <andreas.schaufler@xxxxxx> > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Cc: Joonsoo Kim <js1304@xxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx> > > mm: hugetlb: Use node interface of cma > > With introduction of numa node interface for CMA, this patch is for using that > interface for allocating memory on numa nodes if NUMA is configured. > This will be more efficient and cleaner because first, instead of iterating > mem range of each numa node, cma_declare_contigueous_nid() will do > its own address finding if we pass 0 for both min_pfn and max_pfn, > second, it can also handle caseswhere NUMA is not configured > by passing NUMA_NO_NODE as an argument. > > In addition, checking if desired size of memory is available or not, > is happening in cma_declare_contiguous_nid() because base and > limit will be determined there, since 0(any) for base and > 0(any) for limit is passed as argument to the function. > > Signed-off-by: Aslan Bakirov <aslan@xxxxxx> > Acked-by: Roman Gushchin <guro@xxxxxx> Minor nit below. For the squashed version feel free to add Acked-by: Michal Hocko <mhocko@xxxxxxxx> > --- > .../admin-guide/kernel-parameters.txt | 7 ++ > arch/arm64/mm/init.c | 6 ++ > arch/x86/kernel/setup.c | 4 + > include/linux/hugetlb.h | 8 ++ > mm/hugetlb.c | 98 +++++++++++++++++++ > 5 files changed, 123 insertions(+) > [...] > + reserved = 0; > + for_each_node_state(nid, N_ONLINE) { > + int res; > + > + size = min(per_node, hugetlb_cma_size - reserved); > + size = round_up(size, PAGE_SIZE << order); > + > + > +#ifndef CONFIG_NUMA > + nid = NUMA_NO_NODE > +#endif This can be dropped. UMA will simply use node 0 and the memblock allocator will just do the right thing. > + res = cma_declare_contiguous_nid(0, size, > + 0, > + PAGE_SIZE << order, > + 0, false, > + "hugetlb", &hugetlb_cma[nid], nid); > + > + if (res) { > + pr_warn("%s: reservation failed: err %d, node %d\n", > + __func__, res, nid); > + break; > + } > + > + reserved += size; > + pr_info("hugetlb_cma: reserved %lu MiB on node %d\n", > + size / SZ_1M, nid); > + > + if (reserved >= hugetlb_cma_size) > + break; > + } > +} > + > +#endif /* CONFIG_CMA */ > -- > 2.26.0 -- Michal Hocko SUSE Labs