On Tue 07-04-20 09:38:40, Roman Gushchin wrote: > Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at > runtime") has added the run-time allocation of gigantic pages. However it > actually works only at early stages of the system loading, when the > majority of memory is free. After some time the memory gets fragmented by > non-movable pages, so the chances to find a contiguous 1 GB block are > getting close to zero. Even dropping caches manually doesn't help a lot. > > At large scale rebooting servers in order to allocate gigantic hugepages > is quite expensive and complex. At the same time keeping some constant > percentage of memory in reserved hugepages even if the workload isn't > using it is a big waste: not all workloads can benefit from using 1 GB > pages. > > The following solution can solve the problem: > 1) On boot time a dedicated cma area* is reserved. The size is passed > as a kernel argument. > 2) Run-time allocations of gigantic hugepages are performed using the > cma allocator and the dedicated cma area > > In this case gigantic hugepages can be allocated successfully with a high > probability, however the memory isn't completely wasted if nobody is using > 1GB hugepages: it can be used for pagecache, anon memory, THPs, etc. > > * On a multi-node machine a per-node cma area is allocated on each node. > Following gigantic hugetlb allocation are using the first available > numa node if the mask isn't specified by a user. > > Usage: > 1) configure the kernel to allocate a cma area for hugetlb allocations: > pass hugetlb_cma=10G as a kernel argument > > 2) allocate hugetlb pages as usual, e.g. > echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > If the option isn't enabled or the allocation of the cma area failed, > the current behavior of the system is preserved. > > x86 and arm-64 are covered by this patch, other architectures can be > trivially added later. > > The patch contains clean-ups and fixes proposed and implemented by > Aslan Bakirov and Randy Dunlap. It also contains ideas and suggestions > proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks! > > Signed-off-by: Roman Gushchin <guro@xxxxxx> > Tested-by: Andreas Schaufler <andreas.schaufler@xxxxxx> > Acked-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Acked-by: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Aslan Bakirov <aslan@xxxxxx> > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxxx> > Cc: Joonsoo Kim <js1304@xxxxxxxxx> Thanks a lot for addressing my review feedback! -- Michal Hocko SUSE Labs