On 12/08/23 10:52, Gang Li wrote: > Hi all, hugetlb init parallelization has now been updated to v2. Thanks for your efforts, and sorry for my late comments. > To David Hildenbrand: padata multithread utilities has been used to reduce > code complexity. > > To David Rientjes: The patch for measuring time will be separately included > in the reply. Please test during your free time, thanks. > > # Introduction > Hugetlb initialization during boot takes up a considerable amount of time. > For instance, on a 2TB system, initializing 1,800 1GB huge pages takes 1-2 > seconds out of 10 seconds. Initializing 11,776 1GB pages on a 12TB Intel > host takes 65.2 seconds [1], which is 17.4% of the total 373.78 seconds boot > time. This is a noteworthy figure. One issue to be concerned with is hugetlb page allocation on systems with unbalanced numa node memory. Commit f60858f9d327 ("hugetlbfs: don't retry when pool page allocations start to fail") was added to deal with issues reported on such systems. So, users are certainly using hugetlb pages on systems with imbalances. If performing allocations in parallel, I believe we would want the total number of hugetlb pages allocated to be the same as today. For example, consider a simple 2 node system with 16GB total memory: node 0: 2GB node 1: 14GB With today's code, allocating 6656 2MB pages via the kernel command line results in: node 0: 924 pages node 1: 5732 pages total: 6656 pages With code to parallel allocations in this series: node 0: 924 pages node 1: 1547 pages total: 2471 pages -- Mike Kravetz