Re: [PATCH v2] mm/vmstat: Defer the refresh_zone_stat_thresholds after all CPUs bringup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/12/24 11:43, Saurabh Sengar wrote:
> refresh_zone_stat_thresholds function has two loops which is expensive for
> higher number of CPUs and NUMA nodes.
> 
> Below is the rough estimation of total iterations done by these loops
> based on number of NUMA and CPUs.
> 
> Total number of iterations: nCPU * 2 * Numa * mCPU
> Where:
>  nCPU = total number of CPUs
>  Numa = total number of NUMA nodes
>  mCPU = mean value of total CPUs (e.g., 512 for 1024 total CPUs)
> 
> For the system under test with 16 NUMA nodes and 1024 CPUs, this
> results in a substantial increase in the number of loop iterations
> during boot-up when NUMA is enabled:
> 
> No NUMA = 1024*2*1*512  =   1,048,576 : Here refresh_zone_stat_thresholds
> takes around 224 ms total for all the CPUs in the system under test.
> 16 NUMA = 1024*2*16*512 =  16,777,216 : Here refresh_zone_stat_thresholds
> takes around 4.5 seconds total for all the CPUs in the system under test.
> 
> Calling this for each CPU is expensive when there are large number
> of CPUs along with multiple NUMAs. Fix this by deferring
> refresh_zone_stat_thresholds to be called later at once when all the
> secondary CPUs are up. Also, register the DYN hooks to keep the
> existing hotplug functionality intact.
> 
> Signed-off-by: Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx>
> ---
> [V2]
> 	- Move vmstat_late_init_done under CONFIG_SMP to fix
>           variable 'defined but not used' warning.
> 
>  mm/vmstat.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4e2dc067a654..fa235c65c756 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1908,6 +1908,7 @@ static const struct seq_operations vmstat_op = {
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
>  int sysctl_stat_interval __read_mostly = HZ;
> +static int vmstat_late_init_done;
>  
>  #ifdef CONFIG_PROC_FS
>  static void refresh_vm_stats(struct work_struct *work)
> @@ -2110,7 +2111,8 @@ static void __init init_cpu_node_state(void)
>  
>  static int vmstat_cpu_online(unsigned int cpu)
>  {
> -	refresh_zone_stat_thresholds();
> +	if (vmstat_late_init_done)
> +		refresh_zone_stat_thresholds();
>  
>  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
>  		node_set_state(cpu_to_node(cpu), N_CPU);
> @@ -2142,6 +2144,14 @@ static int vmstat_cpu_dead(unsigned int cpu)
>  	return 0;
>  }
>  
> +static int __init vmstat_late_init(void)
> +{
> +	refresh_zone_stat_thresholds();
> +	vmstat_late_init_done = 1;
> +
> +	return 0;
> +}
> +late_initcall(vmstat_late_init);>  #endif
>  
>  struct workqueue_struct *mm_percpu_wq;

late_initcall() triggered vmstat_late_init() guaranteed to be called
before the last call into vmstat_cpu_online() during a normal boot ?
Otherwise refresh_zone_stat_thresholds() will never be called unless
there is a CPU online event later.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux