On Thu, Feb 22, 2024 at 10:04:21PM +0800, Gang Li wrote: > Optimizing the initialization speed of 1G huge pages through > parallelization. > > 1G hugetlbs are allocated from bootmem, a process that is already > very fast and does not currently require optimization. Therefore, > we focus on parallelizing only the initialization phase in > `gather_bootmem_prealloc`. > > Here are some test results: > test case no patch(ms) patched(ms) saved > ------------------- -------------- ------------- -------- > 256c2T(4 node) 1G 4745 2024 57.34% > 128c1T(2 node) 1G 3358 1712 49.02% > 12T 1G 77000 18300 76.23% Another great improvement. > +static void __init gather_bootmem_prealloc_parallel(unsigned long start, > + unsigned long end, void *arg) > +{ > + int nid; > + > + for (nid = start; nid < end; nid++) > + gather_bootmem_prealloc_node(nid); > +} > + > +static void __init gather_bootmem_prealloc(void) > +{ > + struct padata_mt_job job = { > + .thread_fn = gather_bootmem_prealloc_parallel, > + .fn_arg = NULL, > + .start = 0, > + .size = num_node_state(N_MEMORY), > + .align = 1, > + .min_chunk = 1, > + .max_threads = num_node_state(N_MEMORY), > + .numa_aware = true, > + }; > + > + padata_do_multithreaded(&job); > +} Looks fine from the padata side. Acked-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> # padata