On Thu, Nov 10, 2022 at 08:00:26PM -0800, Yury Norov wrote: > The function finds Nth set CPU in a given cpumask starting from a given > node. > > Leveraging the fact that each hop in sched_domains_numa_masks includes the > same or greater number of CPUs than the previous one, we can use binary > search on hops instead of linear walk, which makes the overall complexity > of O(log n) in terms of number of cpumask_weight() calls. ... > +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) > +{ > + unsigned int first = 0, mid, last = sched_domains_numa_levels; > + struct cpumask ***masks; *** ? Hmm... Do we really need such deep indirection? > + int w, ret = nr_cpu_ids; > + > + rcu_read_lock(); > + masks = rcu_dereference(sched_domains_numa_masks); > + if (!masks) > + goto out; > + > + while (last >= first) { > + mid = (last + first) / 2; > + > + if (cpumask_weight_and(cpus, masks[mid][node]) <= cpu) { > + first = mid + 1; > + continue; > + } > + > + w = (mid == 0) ? 0 : cpumask_weight_and(cpus, masks[mid - 1][node]); See below. > + if (w <= cpu) > + break; > + > + last = mid - 1; > + } We have lib/bsearch.h. I haven't really looked deeply into the above, but my gut feelings that that might be useful here. Can you check that? > + ret = (mid == 0) ? > + cpumask_nth_and(cpu - w, cpus, masks[mid][node]) : > + cpumask_nth_and_andnot(cpu - w, cpus, masks[mid][node], masks[mid - 1][node]); You can also shorten this by inversing the conditional: ret = mid ? ...not 0... : ...for 0...; > +out: out_unlock: ? > + rcu_read_unlock(); > + return ret; > +} -- With Best Regards, Andy Shevchenko