Re: [RFC PATCH v4 3/7] mm/demotion: Build demotion targets based on explicit memory tiers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 27 May 2022 17:55:24 +0530
"Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> wrote:

> From: Jagdish Gediya <jvgediya@xxxxxxxxxxxxx>
> 
> This patch switch the demotion target building logic to use memory tiers
> instead of NUMA distance. All N_MEMORY NUMA nodes will be placed in the
> default tier 1 and additional memory tiers will be added by drivers like
> dax kmem.
> 
> This patch builds the demotion target for a NUMA node by looking at all
> memory tiers below the tier to which the NUMA node belongs. The closest node
> in the immediately following memory tier is used as a demotion target.
> 
> Since we are now only building demotion target for N_MEMORY NUMA nodes
> the CPU hotplug calls are removed in this patch.
> 
> Signed-off-by: Jagdish Gediya <jvgediya@xxxxxxxxxxxxx>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx>

Hi

Diff made a mess of this one!

Anyhow, a few comments inline.

Thanks,

Jonathan


> --- a/mm/migrate.c
> +++ b/mm/migrate.c

> +/*
> + * node_demotion[] examples:

Perhaps call out these are examples of possible default situations.
None are enforced by this code.

> + *
> + * Example 1:
> + *
> + * Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM nodes.
> + *
> + * node distances:
> + * node   0    1    2    3
> + *    0  10   20   30   40
> + *    1  20   10   40   30
> + *    2  30   40   10   40
> + *    3  40   30   40   10
> + *
> + * memory_tiers[0] = <empty>
> + * memory_tiers[1] = 0-1
> + * memory_tiers[2] = 2-3
> + *
> + * node_demotion[0].preferred = 2
> + * node_demotion[1].preferred = 3
> + * node_demotion[2].preferred = <empty>
> + * node_demotion[3].preferred = <empty>
> + *
> + * Example 2:
> + *
> + * Node 0 & 1 are CPU + DRAM nodes, node 2 is memory-only DRAM node.
> + *
> + * node distances:
> + * node   0    1    2
> + *    0  10   20   30
> + *    1  20   10   30
> + *    2  30   30   10
> + *
> + * memory_tiers[0] = <empty>
> + * memory_tiers[1] = 0-2
> + * memory_tiers[2] = <empty>
> + *
> + * node_demotion[0].preferred = <empty>
> + * node_demotion[1].preferred = <empty>
> + * node_demotion[2].preferred = <empty>
> + *
> + * Example 3:
> + *
> + * Node 0 is CPU + DRAM nodes, Node 1 is HBM node, node 2 is PMEM node.
> + *
> + * node distances:
> + * node   0    1    2
> + *    0  10   20   30
> + *    1  20   10   40
> + *    2  30   40   10
> + *
> + * memory_tiers[0] = 1
> + * memory_tiers[1] = 0
> + * memory_tiers[2] = 2
> + *
> + * node_demotion[0].preferred = 2
> + * node_demotion[1].preferred = 0
> + * node_demotion[2].preferred = <empty>
> + *
> + */



>  /* Disable reclaim-based migration. */
>  static void __disable_all_migrate_targets(void)
>  {

> +	int node;
>  
> +	for_each_node_mask(node, node_states[N_MEMORY])
> +		node_demotion[node].preferred = NODE_MASK_NONE;
>  }

>  /*
			    int best_distance)
> +* Find an automatic demotion target for all memory
> +* nodes. Failing here is OK.  It might just indicate
> +* being at the end of a chain.
> +*/
> +static void establish_migration_targets(void)
>  {
Diff did a horrible job on this, so I've reformatted heavily
so could see what was happening!

>  	struct demotion_nodes *nd;
> +	int tier, target = NUMA_NO_NODE, node;
> +	int distance, best_distance;
> +	nodemask_t used;
>  
>  	if (!node_demotion)
> +		return;
>  
> +	disable_all_migrate_targets();
> +	for_each_node_mask(node, node_states[N_MEMORY]) {
> +		best_distance = -1;
> +		nd = &node_demotion[node];
>  
> +		tier = __node_get_memory_tier(node);
> +		/*
> +		 * Find next tier to demote.

in discussion of Wei Xu's RFC we concluded that we need
to allow demotion to nearest node in 'any' higher tier
(now bigger rank).  That functionality matters for even
moderately complex systems.

> +		 */
> +		while (++tier < MAX_MEMORY_TIERS) {
> +			if (memory_tiers[tier])
> +				break;
> +		}

> +		if (tier >= MAX_MEMORY_TIERS)
> +			continue;
>  
> +		nodes_andnot(used, node_states[N_MEMORY], memory_tiers[tier]->nodelist); 

I'm a bit lost on this one.  Perhaps a comment to say what 'used' represents?
I was expecting all memory nodes in tiers with rank > current tier. I'm not sure that's what
we have here.

>  
>  		/*
> +		 * Find all the nodes in the memory tier node list of same best distance.
> +		 * add add them to the preferred mask. We randomly select between nodes

repeated add.

> +		 * in the preferred mask when allocating pages during demotion.
>  		 */
>  		do {
> +			target = find_next_best_node(node, &used);
> +			if (target == NUMA_NO_NODE)
>  				break;
>  
> +			distance = node_distance(node, target);
> +			if (distance == best_distance || best_distance == -1) {
> +				best_distance = distance;
> +				node_set(target, nd->preferred);
> +			} else {
> +				break;
> +			}
>  		} while (1);
>  	}

>  }
>  





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux