On Tue, 2022-06-07 at 15:51 -0700, Tim Chen wrote: > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: > > > > +int next_demotion_node(int node) > > +{ > > + struct demotion_nodes *nd; > > + int target, nnodes, i; > > + > > + if (!node_demotion) > > + return NUMA_NO_NODE; > > + > > + nd = &node_demotion[node]; > > + > > + /* > > + * node_demotion[] is updated without excluding this > > + * function from running. > > + * > > + * Make sure to use RCU over entire code blocks if > > + * node_demotion[] reads need to be consistent. > > + */ > > + rcu_read_lock(); > > + > > + nnodes = nodes_weight(nd->preferred); > > + if (!nnodes) > > + return NUMA_NO_NODE; > > + > > + /* > > + * If there are multiple target nodes, just select one > > + * target node randomly. > > + * > > + * In addition, we can also use round-robin to select > > + * target node, but we should introduce another variable > > + * for node_demotion[] to record last selected target node, > > + * that may cause cache ping-pong due to the changing of > > + * last target node. Or introducing per-cpu data to avoid > > + * caching issue, which seems more complicated. So selecting > > + * target node randomly seems better until now. > > + */ > > + nnodes = get_random_int() % nnodes; > > + target = first_node(nd->preferred); > > + for (i = 0; i < nnodes; i++) > > + target = next_node(target, nd->preferred); > > We can simplify the above 4 lines. > > target = node_random(nd->preferred); > > There's still a loop overhead though :( To avoid loop overhead, we can use the original implementation of next_demotion_node. The performance is much better for the most common cases, the number of preferred node is 1. Best Regards, Huang, Ying > > > > > > + > > + rcu_read_unlock(); > > + > > + return target; > > +} > > + > > > > + */ > > +static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, > > + unsigned long action, void *_arg) > > +{ > > + struct memory_notify *arg = _arg; > > + > > + /* > > + * Only update the node migration order when a node is > > + * changing status, like online->offline. > > + */ > > + if (arg->status_change_nid < 0) > > + return notifier_from_errno(0); > > + > > + switch (action) { > > + case MEM_OFFLINE: > > + /* > > + * In case we are moving out of N_MEMORY. Keep the node > > + * in the memory tier so that when we bring memory online, > > + * they appear in the right memory tier. We still need > > + * to rebuild the demotion order. > > + */ > > + mutex_lock(&memory_tier_lock); > > + establish_migration_targets(); > > + mutex_unlock(&memory_tier_lock); > > + break; > > + case MEM_ONLINE: > > + /* > > + * We ignore the error here, if the node already have the tier > > + * registered, we will continue to use that for the new memory > > + * we are adding here. > > + */ > > + node_set_memory_tier(arg->status_change_nid, DEFAULT_MEMORY_TIER); > > Should establish_migration_targets() be run here? Otherwise what are the > demotion targets for this newly onlined node? > > > + break; > > + } > > + > > + return notifier_from_errno(0); > > +} > > + > > Tim >