On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote: > In many cases people use bitmap_weight()-based functions like this: > > if (num_present_cpus() > 1) > do_something(); > > This may take considerable amount of time on many-cpus machines because > num_present_cpus() will traverse every word of underlying cpumask > unconditionally. > > We can significantly improve on it for many real cases if stop traversing > the mask as soon as we count present cpus to any number greater than 1: > > if (num_present_cpus_gt(1)) > do_something(); > > To implement this idea, the series adds bitmap_weight_{eq,gt,le} > functions together with corresponding wrappers in cpumask and nodemask. Having slept on it I have more structured thoughts: First, I like substituting bitmap_empty/full where possible - I think the change stands on its own, so could be split and sent as is. I don't like the proposed API very much. One problem is that it hides the comparison operator and makes call sites less readable: bitmap_weight(...) > N becomes: bitmap_weight_gt(..., N) and: bitmap_weight(...) <= N becomes: bitmap_weight_lt(..., N+1) or: !bitmap_weight_gt(..., N) I'd rather see something resembling memcmp() API that's known enough to be easier to grasp. For above examples: bitmap_weight_cmp(..., N) > 0 bitmap_weight_cmp(..., N) <= 0 ... This would also make the implementation easier in not having to copy and paste the code three times. Could also use a simple optimization reducing code size: #include <linux/overflow.h> int bitmap_weight_cmp(long *bits, size_t nbits, size_t cmp) { for (size_t i = 0; i < nbits / BITS_PER_LONG; ++i, ++bits) if (check_sub_overflow(cmp, popcount(*bits), &cmp)) return 1; nbits %= BITS_PER_LONG; if (nbits && check_sub_overflow(cmp, popcount(*bits & GENMASK(nbits)), &cmp)) return 1; return cmp ? -1 : 0; } Best Regards Michał Mirosław