cpumask_local_spread() currently checks local node for presence of i'th CPU, and then if it finds nothing makes a flat search among all non-local CPUs. We can do it better by checking CPUs per NUMA hops. This has significant performance implications on NUMA machines, for example when using NUMA-aware allocated memory together with NUMA-aware IRQ affinity hints. Performance tests from patch 8 of this series for mellanox network driver show: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ | | BW (Gbps) | TX side CPU util | RX side CPU util | +-------------------------+-----------+------------------+------------------+ | Baseline | 52.3 | 6.4 % | 17.9 % | +-------------------------+-----------+------------------+------------------+ | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | +-------------------------+-----------+------------------+------------------+ | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | +-------------------------+-----------+------------------+------------------+ | Applied on both sides | 95.1 | 8.4 % | 27.3 % | +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. This series was supposed to be included in v6.2, but that didn't happen. It spent enough in -next without any issues, so I hope we'll finally see it in v6.3. I believe, the best way would be moving it with scheduler patches, but I'm OK to try again with bitmap branch as well. Tariq Toukan (1): net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider (2): sched/topology: Introduce sched_numa_hop_mask() sched/topology: Introduce for_each_numa_hop_mask() Yury Norov (6): lib/find: introduce find_nth_and_andnot_bit cpumask: introduce cpumask_nth_and_andnot sched: add sched_numa_find_nth_cpu() cpumask: improve on cpumask_local_spread() locality lib/cpumask: reorganize cpumask_local_spread() logic lib/cpumask: update comment for cpumask_local_spread() drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++- include/linux/cpumask.h | 20 +++++ include/linux/find.h | 33 +++++++ include/linux/topology.h | 33 +++++++ kernel/sched/topology.c | 90 ++++++++++++++++++++ lib/cpumask.c | 52 ++++++----- lib/find_bit.c | 9 ++ 7 files changed, 230 insertions(+), 25 deletions(-) -- 2.34.1