Re: [PATCH v6 8/8] hugetlb: parallelize 1G hugetlb initialization

Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> · Fri, 8 Mar 2024 12:35:37 -0500

On Thu, Feb 22, 2024 at 10:04:21PM +0800, Gang Li wrote:
> Optimizing the initialization speed of 1G huge pages through
> parallelization.
> 
> 1G hugetlbs are allocated from bootmem, a process that is already
> very fast and does not currently require optimization. Therefore,
> we focus on parallelizing only the initialization phase in
> `gather_bootmem_prealloc`.
> 
> Here are some test results:
>       test case       no patch(ms)   patched(ms)   saved
>  ------------------- -------------- ------------- --------
>   256c2T(4 node) 1G           4745          2024   57.34%
>   128c1T(2 node) 1G           3358          1712   49.02%
>      12T         1G          77000         18300   76.23%

Another great improvement.

> +static void __init gather_bootmem_prealloc_parallel(unsigned long start,
> +						    unsigned long end, void *arg)
> +{
> +	int nid;
> +
> +	for (nid = start; nid < end; nid++)
> +		gather_bootmem_prealloc_node(nid);
> +}
> +
> +static void __init gather_bootmem_prealloc(void)
> +{
> +	struct padata_mt_job job = {
> +		.thread_fn	= gather_bootmem_prealloc_parallel,
> +		.fn_arg		= NULL,
> +		.start		= 0,
> +		.size		= num_node_state(N_MEMORY),
> +		.align		= 1,
> +		.min_chunk	= 1,
> +		.max_threads	= num_node_state(N_MEMORY),
> +		.numa_aware	= true,
> +	};
> +
> +	padata_do_multithreaded(&job);
> +}

Looks fine from the padata side.

Acked-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> # padata