Re: [PATCH v6 8/8] hugetlb: parallelize 1G hugetlb initialization

Gang Li <gang.li@xxxxxxxxx> · Tue, 12 Mar 2024 10:26:07 +0800

Thanks for your review :)

On 2024/3/9 01:35, Daniel Jordan wrote:
On Thu, Feb 22, 2024 at 10:04:21PM +0800, Gang Li wrote:
Optimizing the initialization speed of 1G huge pages through
parallelization.

1G hugetlbs are allocated from bootmem, a process that is already
very fast and does not currently require optimization. Therefore,
we focus on parallelizing only the initialization phase in
`gather_bootmem_prealloc`.

Here are some test results:
       test case       no patch(ms)   patched(ms)   saved
  ------------------- -------------- ------------- --------
   256c2T(4 node) 1G           4745          2024   57.34%
   128c1T(2 node) 1G           3358          1712   49.02%
      12T         1G          77000         18300   76.23%

Another great improvement.

+static void __init gather_bootmem_prealloc_parallel(unsigned long start,
+						    unsigned long end, void *arg)
+{
+	int nid;
+
+	for (nid = start; nid < end; nid++)
+		gather_bootmem_prealloc_node(nid);
+}
+
+static void __init gather_bootmem_prealloc(void)
+{
+	struct padata_mt_job job = {
+		.thread_fn	= gather_bootmem_prealloc_parallel,
+		.fn_arg		= NULL,
+		.start		= 0,
+		.size		= num_node_state(N_MEMORY),
+		.align		= 1,
+		.min_chunk	= 1,
+		.max_threads	= num_node_state(N_MEMORY),
+		.numa_aware	= true,
+	};
+
+	padata_do_multithreaded(&job);
+}

Looks fine from the padata side.

Acked-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> # padata