Re: [PATCH v4 6/7] sched/topology: Introduce for_each_numa_hop_cpu()

Yury Norov <yury.norov@xxxxxxxxx> · Sun, 25 Sep 2022 07:58:42 -0700

On Fri, Sep 23, 2022 at 04:55:41PM +0100, Valentin Schneider wrote:
> The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs
> reachable within a given distance budget, but this means each successive
> cpumask is a superset of the previous one.
> 
> Code wanting to allocate one item per CPU (e.g. IRQs) at increasing
> distances would thus need to allocate a temporary cpumask to note which
> CPUs have already been visited. This can be prevented by leveraging
> for_each_cpu_andnot() - package all that logic into one ugl^D fancy macro.
> 
> Signed-off-by: Valentin Schneider <vschneid@xxxxxxxxxx>
> ---
>  include/linux/topology.h | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 3e91ae6d0ad5..7aa7e6a4c739 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -257,5 +257,42 @@ static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
>  }
>  #endif	/* CONFIG_NUMA */
>  
> +/**
> + * for_each_numa_hop_cpu - iterate over CPUs by increasing NUMA distance,
> + *                         starting from a given node.
> + * @cpu: the iteration variable.
> + * @node: the NUMA node to start the search from.
> + *
> + * Requires rcu_lock to be held.
> + * Careful: this is a double loop, 'break' won't work as expected.

This warning concerns me not only because new iteration loop hides
complexity and breaks 'break' (sic!), but also because it looks too
specific. Why don't you split it, so instead:

       for_each_numa_hop_cpu(cpu, dev->priv.numa_node) {
               cpus[i] = cpu;
               if (++i == ncomp_eqs)
                       goto spread_done;
       }

in the following patch you would have something like this:

       for_each_node_hop(hop, node) {
               struct cpumask hop_cpus = sched_numa_hop_mask(node, hop);

               for_each_cpu_andnot(cpu, hop_cpus, ...) {
                       cpus[i] = cpu;
                       if (++i == ncomp_eqs)
                               goto spread_done;
               }
       }

It looks more bulky, but I believe there will be more users for
for_each_node_hop() alone.

On top of that, if you really like it, you can implement
for_each_numa_hop_cpu() if you want.

> + * Implementation notes:
> + *
> + * Providing it is valid, the mask returned by
> + *  sched_numa_hop_mask(node, hops+1)
> + * is a superset of the one returned by
> + *   sched_numa_hop_mask(node, hops)
> + * which may not be that useful for drivers that try to spread things out and
> + * want to visit a CPU not more than once.
> + *
> + * To accommodate for that, we use for_each_cpu_andnot() to iterate over the cpus
> + * of sched_numa_hop_mask(node, hops+1) with the CPUs of
> + * sched_numa_hop_mask(node, hops) removed, IOW we only iterate over CPUs
> + * a given distance away (rather than *up to* a given distance).
> + *
> + * hops=0 forces us to play silly games: we pass cpu_none_mask to
> + * for_each_cpu_andnot(), which turns it into for_each_cpu().
> + */
> +#define for_each_numa_hop_cpu(cpu, node)				       \
> +	for (struct { const struct cpumask *curr, *prev; int hops; } __v =     \
> +		     { sched_numa_hop_mask(node, 0), NULL, 0 };		       \

This anonymous structure is never used as structure. What for you
define it? Why not just declare hops, prev and curr without packing
them?

Thanks,
Yury

> +	     !IS_ERR_OR_NULL(__v.curr);					       \
> +	     __v.hops++,                                                       \
> +	     __v.prev = __v.curr,					       \
> +	     __v.curr = sched_numa_hop_mask(node, __v.hops))                   \
> +		for_each_cpu_andnot(cpu,				       \
> +				    __v.curr,				       \
> +				    __v.hops ? __v.prev : cpu_none_mask)
>  
>  #endif /* _LINUX_TOPOLOGY_H */
> -- 
> 2.31.1