Hi Yury, On Sun, Feb 09, 2025 at 12:40:15PM -0500, Yury Norov wrote: > On Fri, Feb 07, 2025 at 09:40:48PM +0100, Andrea Righi wrote: > > Introduce the new helper numa_nearest_nodemask() to find the closest > > node, in a specified nodemask and state, from a given starting node. > > > > Returns MAX_NUMNODES if no node is found. > > > > Cc: Yury Norov <yury.norov@xxxxxxxxx> > > Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx> > > --- > > include/linux/nodemask_types.h | 6 +++++- > > include/linux/numa.h | 8 +++++++ > > mm/mempolicy.c | 38 ++++++++++++++++++++++++++++++++++ > > 3 files changed, 51 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/nodemask_types.h b/include/linux/nodemask_types.h > > index 6b28d97ea6ed0..8d0b7a66c3a49 100644 > > --- a/include/linux/nodemask_types.h > > +++ b/include/linux/nodemask_types.h > > @@ -5,6 +5,10 @@ > > #include <linux/bitops.h> > > #include <linux/numa.h> > > > > -typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t; > > +struct nodemask { > > + DECLARE_BITMAP(bits, MAX_NUMNODES); > > +}; > > + > > +typedef struct nodemask nodemask_t; > > > > #endif /* __LINUX_NODEMASK_TYPES_H */ > > diff --git a/include/linux/numa.h b/include/linux/numa.h > > index 3567e40329ebc..a549b87d1fca5 100644 > > --- a/include/linux/numa.h > > +++ b/include/linux/numa.h > > @@ -27,6 +27,8 @@ static inline bool numa_valid_node(int nid) > > #define __initdata_or_meminfo __initdata > > #endif > > > > +struct nodemask; > > Numa should include this via linux/nodemask_types.h, or maybe > nodemask.h. Hm... nodemask_types.h needs to include numa.h to resolve MAX_NUMNODES, Maybe we can move numa_nearest_nodemask() to linux/nodemask.h? > > > + > > #ifdef CONFIG_NUMA > > #include <asm/sparsemem.h> > > > > @@ -38,6 +40,7 @@ void __init alloc_offline_node_data(int nid); > > > > /* Generic implementation available */ > > int numa_nearest_node(int node, unsigned int state); > > +int numa_nearest_nodemask(int node, unsigned int state, struct nodemask *mask); > > > > #ifndef memory_add_physaddr_to_nid > > int memory_add_physaddr_to_nid(u64 start); > > @@ -55,6 +58,11 @@ static inline int numa_nearest_node(int node, unsigned int state) > > return NUMA_NO_NODE; > > } > > > > +static inline int numa_nearest_nodemask(int node, unsigned int state, struct nodemask *mask) > > +{ > > + return NUMA_NO_NODE; > > +} > > + > > static inline int memory_add_physaddr_to_nid(u64 start) > > { > > return 0; > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index 162407fbf2bc7..1cfee509c7229 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -196,6 +196,44 @@ int numa_nearest_node(int node, unsigned int state) > > } > > EXPORT_SYMBOL_GPL(numa_nearest_node); > > > > +/** > > + * numa_nearest_nodemask - Find the node in @mask at the nearest distance > > + * from @node. > > + * > > So, I have a feeling about this whole naming scheme. At first, this > function (and the existing numa_nearest_node()) searches for something, > but doesn't begin with find_, search_ or similar. Second, the naming > of existing numa_nearest_node() doesn't reflect that it searches > against the state. Should we always include some state for search? If > so, we can skip mentioning the state, otherwise it should be in the > name, I guess... > > The problem is that I have no idea for better naming, and I have no > understanding about the future of this functions family. If it's just > numa_nearest_node() and numa_nearest_nodemask(), I'm OK to go this > way. If we'll add more flavors similarly to find_bit() family, we > could probably discuss a naming scheme. > > Also, mm/mempolicy.c is a historical place for them, but maybe we need > to move it somewhere else? > > Any thoughts appreciated. Personally I think adding "find_" to the name would be a bit redundant, as it seems quite obvious that it's finding the nearest node. It sounds a bit like the get_something() discussion and we can just use something(). About adding "_state" to the name, it'd make sense since we have for_each_node_state/for_each_node(), but we would need to change numa_nearest_node() -> numa_nearest_node_state((), that is beyond the scope of this patch set. If I had to design this completely from scratch I'd probably propose something like this: - nearest_node_state(node, state) - nearest_node(node) -> nearest_node_state(node, N_POSSIBLE) - nearest_node_nodemask(node, nodemask) -> here the state can be managed with the nodemask (as you suggested below) But again, this is probably a more generic discussion that can be addressed in a separate thread. > > > + * @node: the node to start the search from. > > + * @state: the node state to filter nodes by. > > + * @mask: a pointer to a nodemask representing the allowed nodes. > > + * > > + * This function iterates over all nodes in the given state and calculates > > + * the distance to the starting node. > > + * > > + * Returns the node ID in @mask that is the closest in terms of distance > > + * from @node, or MAX_NUMNODES if no node is found. > > + */ > > +int numa_nearest_nodemask(int node, unsigned int state, nodemask_t *mask) > > Your only user calls the function with N_POSSIBLE: > > s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags) > { > nodemask_t unvisited = NODE_MASK_ALL; > > if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node)) > return pick_idle_cpu_from_node(cpus_allowed, NUMA_NO_NODE, flags); > > > for_each_numa_node(node, unvisited, N_POSSIBLE) > do_something(); > } > > Which means you don't need the state at all. Even more, you don't > need to initialize the unvisited mask before checking the static > branch: > > s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags) > { > nodemask_t unvisited; > > if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node)) > return pick_idle_cpu_from_node(cpus_allowed, NUMA_NO_NODE, flags); > > nodes_clear(unvisited); > > for_each_numa_node(node, unvisited) > do_something(); > } > > > If you need some state other than N_POSSIBLE, you can do it similarly: > > nodemask_complement(unvisited, N_CPU); > > /* Only N_CPU nodes iterated */ > for_each_numa_node(node, unvisited) > do_something(); Good point. I think we can implicitly assume N_POSSIBLE and if we need to filter only a certain state in the future, we can enforce that via the nodemask. > > > > +{ > > + int dist, n, min_dist = INT_MAX, min_node = MAX_NUMNODES; > > + > > + if (node == NUMA_NO_NODE) > > + return MAX_NUMNODES; > > + > > + if (node_state(node, state) && node_isset(node, *mask)) > > + return node; > > This is correct, but why do we need this special case? If distance to > local node is always 0, and distance to remote node is always greater > than 0, the normal search will return local node, right? Is that a > performance trick? If so, can you put a comment please? Otherwise, > maybe just drop it? Yeah we can probably just drop it, I don't it gives much benefit in terms of performance. And the special optimiation case of searching only in one node can be managed already by the caller via SCX_PICK_IDLE_IN_NODE. > > > + > > + for_each_node_state(n, state) { > > + if (!node_isset(n, *mask)) > > + continue; > > for_each_node_state_and_mask(n, state, mask) > > Or if you take the above consideration, just > for_each_node_mask(n, mask) Ok, that's much better. Thanks, -Andrea