Re: [PATCH v2] mm: Reset numa stats for boot pagesets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/7/20 6:29 PM, Sandipan Das wrote:
> Initially, the per-cpu pagesets of each zone are set to the
> boot pagesets. The real pagesets are allocated later but
> before that happens, page allocations do occur and the numa
> stats for the boot pagesets get incremented since they are
> common to all zones at that point.
> 
> The real pagesets, however, are allocated for the populated
> zones only. Unpopulated zones, like those associated with
> memory-less nodes, continue using the boot pageset and end
> up skewing the numa stats of the corresponding node.
> 
> E.g.
> 
>   $ numactl -H
>   available: 2 nodes (0-1)
>   node 0 cpus: 0 1 2 3
>   node 0 size: 0 MB
>   node 0 free: 0 MB
>   node 1 cpus: 4 5 6 7
>   node 1 size: 8131 MB
>   node 1 free: 6980 MB
>   node distances:
>   node   0   1
>     0:  10  40
>     1:  40  10
> 
>   $ numastat
>                              node0           node1
>   numa_hit                     108           56495
>   numa_miss                      0               0
>   numa_foreign                   0               0
>   interleave_hit                 0            4537
>   local_node                   108           31547
>   other_node                     0           24948
> 
> Hence, the boot pageset stats need to be cleared after
> the real pagesets are allocated.
> 
> From this point onwards, the stats of the boot pagesets do
> not change as page allocations requested for a memory-less
> node will either fail (if __GFP_THISNODE is used) or get
> fulfilled by a preferred zone of a different node based on
> the fallback zonelist.
> 
> Signed-off-by: Sandipan Das <sandipan@xxxxxxxxxxxxx>

Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
With suggestion below.

> ---
> 
> The previous version and discussion around it can be found at
> https://lore.kernel.org/linux-mm/20200504070304.127361-1-sandipan@xxxxxxxxxxxxx/
> 
> Changes in v2:
> 
> - Reset the stats of the boot pagesets instead of explicitly
>   returning zero as suggested by Vlastimil.
> 
> - Changed the subject to reflect the above.
> 
> ---
>  mm/page_alloc.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 69827d4fa052..1543e32f7e4e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6256,6 +6256,25 @@ void __init setup_per_cpu_pageset(void)
>  	for_each_populated_zone(zone)
>  		setup_zone_pageset(zone);
>  
> +#ifdef CONFIG_NUMA
> +	if (static_branch_likely(&vm_numa_stat_key)) {

I would just remove this test and do it unconditionally, as the branch can be
only disabled later in boot by a sysctl.

> +		struct per_cpu_pageset *pcp;
> +		int cpu;
> +
> +		/*
> +		 * Unpopulated zones continue using the boot pagesets.
> +		 * The numa stats for these pagesets need to be reset.
> +		 * Otherwise, they will end up skewing the stats of
> +		 * the nodes these zones are associated with.
> +		 */
> +		for_each_possible_cpu(cpu) {
> +			pcp = &per_cpu(boot_pageset, cpu);
> +			memset(pcp->vm_numa_stat_diff, 0,
> +			       sizeof(pcp->vm_numa_stat_diff));
> +		}
> +	}
> +#endif
> +
>  	for_each_online_pgdat(pgdat)
>  		pgdat->per_cpu_nodestats =
>  			alloc_percpu(struct per_cpu_nodestat);
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux