On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote: > On Sat 2021-08-14 14:17:07, Yury Norov wrote: > > The macros iterate thru all set/clear bits in a bitmap. They search a > > first bit using find_first_bit(), and the rest bits using find_next_bit(). > > > > Since find_next_bit() is called shortly after find_first_bit(), we can > > save few lines of I-cache by not using find_first_bit(). > > Is this only a speculation or does it fix a real performance problem? > > The macro is used like: > > for_each_set_bit(bit, addr, size) { > fn(bit); > } > > IMHO, the micro-opimization does not help when fn() is non-trivial. The effect is measurable: Start testing for_each_bit() for_each_set_bit: 15296 ns, 1000 iterations for_each_set_bit_from: 15225 ns, 1000 iterations Start testing for_each_bit() with cash flushing for_each_set_bit: 547626 ns, 1000 iterations for_each_set_bit_from: 497899 ns, 1000 iterations Refer this: https://www.mail-archive.com/dri-devel@xxxxxxxxxxxxxxxxxxxxx/msg356151.html Thanks, Yury > > --- a/include/linux/find.h > > +++ b/include/linux/find.h > > @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned > > #endif > > > > #define for_each_set_bit(bit, addr, size) \ > > - for ((bit) = find_first_bit((addr), (size)); \ > > + for ((bit) = find_next_bit((addr), (size), 0); \ > > (bit) < (size); \ > > (bit) = find_next_bit((addr), (size), (bit) + 1)) > > > > It is not a big deal. I just think that the original code is slightly > more self-explaining. > > Best Regards, > Petr