On Thu, Feb 13, 2025 at 05:19:58PM +0100, Andrea Righi wrote: > On Thu, Feb 13, 2025 at 10:57:00AM -0500, Yury Norov wrote: > > On Wed, Feb 12, 2025 at 05:48:09PM +0100, Andrea Righi wrote: > > > Introduce the new helper nearest_node_nodemask() to find the closest > > > node in a specified nodemask from a given starting node. > > > > > > Returns MAX_NUMNODES if no node is found. > > > > > > Cc: Yury Norov <yury.norov@xxxxxxxxx> > > > Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx> > > > > Suggested-by: Yury Norov [NVIDIA] <yury.norov@xxxxxxxxx> > > Ok. > > > > > > --- > > > include/linux/numa.h | 7 +++++++ > > > mm/mempolicy.c | 32 ++++++++++++++++++++++++++++++++ > > > 2 files changed, 39 insertions(+) > > > > > > diff --git a/include/linux/numa.h b/include/linux/numa.h > > > index 31d8bf8a951a7..e6baaf6051bcf 100644 > > > --- a/include/linux/numa.h > > > +++ b/include/linux/numa.h > > > @@ -31,6 +31,8 @@ void __init alloc_offline_node_data(int nid); > > > /* Generic implementation available */ > > > int numa_nearest_node(int node, unsigned int state); > > > > > > +int nearest_node_nodemask(int node, nodemask_t *mask); > > > + > > > > See how you use it. It looks a bit inconsistent to the other functions: > > > > #define for_each_node_numadist(node, unvisited) \ > > for (int start = (node), \ > > node = nearest_node_nodemask((start), &(unvisited)); \ > > node < MAX_NUMNODES; \ > > node_clear(node, (unvisited)), \ > > node = nearest_node_nodemask((start), &(unvisited))) > > > > > > I would suggest to make it aligned with the rest of the API: > > > > #define node_clear(node, dst) __node_clear((node), &(dst)) > > static __always_inline void __node_clear(int node, volatile nodemask_t *dstp) > > { > > clear_bit(node, dstp->bits); > > } > > Sorry Yury, can you elaborate more on this? What do you mean with > inconsistent, is it the volatile nodemask_t *? What I mean is: #define nearest_node_nodemask(start, srcp) __nearest_node_nodemask((start), &(srcp)) int __nearest_node_nodemask(int node, nodemask_t *mask); That way you'll be able to make the above for-loop looking more uniform: #define for_each_node_numadist(node, unvisited) \ for (int __s = (node), \ (node) = nearest_node_nodemask(__s, (unvisited)); \ (node) < MAX_NUMNODES; \ node_clear((node), (unvisited)), \ (node) = nearest_node_nodemask(__s, (unvisited))) > > > #ifndef memory_add_physaddr_to_nid > > > int memory_add_physaddr_to_nid(u64 start); > > > #endif > > > @@ -47,6 +49,11 @@ static inline int numa_nearest_node(int node, unsigned int state) > > > return NUMA_NO_NODE; > > > } > > > > > > +static inline int nearest_node_nodemask(int node, nodemask_t *mask) > > > +{ > > > + return NUMA_NO_NODE; > > > +} > > > + > > > static inline int memory_add_physaddr_to_nid(u64 start) > > > { > > > return 0; > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > > index 162407fbf2bc7..1e2acf187ea3a 100644 > > > --- a/mm/mempolicy.c > > > +++ b/mm/mempolicy.c > > > @@ -196,6 +196,38 @@ int numa_nearest_node(int node, unsigned int state) > > > } > > > EXPORT_SYMBOL_GPL(numa_nearest_node); > > > > > > +/** > > > + * nearest_node_nodemask - Find the node in @mask at the nearest distance > > > + * from @node. > > > + * > > > + * @node: the node to start the search from. > > > + * @mask: a pointer to a nodemask representing the allowed nodes. > > > + * > > > + * This function iterates over all nodes in the given state and calculates > > > + * the distance to the starting node. > > > + * > > > + * Returns the node ID in @mask that is the closest in terms of distance > > > + * from @node, or MAX_NUMNODES if no node is found. > > > + */ > > > +int nearest_node_nodemask(int node, nodemask_t *mask) > > > +{ > > > + int dist, n, min_dist = INT_MAX, min_node = MAX_NUMNODES; > > > + > > > + if (node == NUMA_NO_NODE) > > > + return MAX_NUMNODES; > > > > This makes it unclear: you make it legal to pass NUMA_NO_NODE, but > > your function returns something useless. I don't think it would help > > users in any reasonable scenario. > > > > So, if you don't want user to call this with node == NUMA_NO_NODE, > > just describe it in comment on top of the function. Otherwise, please > > do something useful like > > > > if (node == NUMA_NO_NODE) > > node = current_node; > > > > I would go with option 1. Notice, node_distance() doesn't bother to > > check against NUMA_NO_NODE. > > Hm... is it? Looking at __node_distance(), it doesn't seem really safe to > pass a negative value (maybe I'm missing something?). It's not safe, but inside the kernel we don't check parameters. Out of your courtesy you may decide to put a comment, but strictly speaking you don't have to. > Anyway, I'd also prefer to go with option 1 and not implicitly assuming > NUMA_NO_NODE == current node (it feels that it might hide nasty bugs). Yeah, very true > So, I can add a comment in the description to clarify that NUMA_NO_NODE is > forbidenx, but what is someone is passing it? Should we WARN_ON_ONCE() at > least? He will brick his testing board, and learn to read comments in a hard way. Speaking more seriously, you will be most likely CCed as an author of that function, and you will be able to comment that on review. Also, there's a great chance that it will be caught by KASAN or some other sanitation tool even before someone sends a buggy patch. This is an old as the world and very well known problem, and everyone is aware. Thanks, Yury