Allocating huge pages can take a very long time on servers with terabytes of memory even when they are allocated at boot time where the allocation happens in parallel. The kernel currently uses a hard coded value of 2 threads per NUMA node for these allocations. This value might have been good enough in the past but it is not sufficient to fully utilize newer systems. This patch allows to override this value. We tested this on 2 generations of Xeon CPUs and the results show a big improvement of the overall allocation time. +--------------------+-------+-------+-------+-------+-------+ | threads per node | 2 | 4 | 8 | 16 | 32 | +--------------------+-------+-------+-------+-------+-------+ | skylake 4node | 44s | 22s | 16s | 19s | 20s | | cascade lake 4node | 39s | 20s | 11s | 10s | 9s | +--------------------+-------+-------+-------+-------+-------+ On skylake, we see an improvment of 2.75x when using 8 threads, on cascade lake we can get even better at 4.3x when we use 32 threads per node. This speedup is quite significant and users of large machines like these should have the option to make the machines boot as fast as possible. Signed-off-by: Thomas Prescher <thomas.prescher@xxxxxxxxxxxxxxxxxxxxx> --- Thomas Prescher (2): mm: hugetlb: add hugetlb_alloc_threads cmdline option mm: hugetlb: log time needed to allocate hugepages Documentation/admin-guide/kernel-parameters.txt | 7 +++ Documentation/admin-guide/mm/hugetlbpage.rst | 9 +++- mm/hugetlb.c | 59 ++++++++++++++++++------- 3 files changed, 58 insertions(+), 17 deletions(-) --- base-commit: 334426094588f8179fe175a09ecc887ff0c75758 change-id: 20250221-hugepage-parameter-e8542fdfc0ae Best regards, -- Thomas Prescher <thomas.prescher@xxxxxxxxxxxxxxxxxxxxx>