On Fri, 18 Jun 2021 20:57:34 +0100, Yury Norov <yury.norov@xxxxxxxxx> wrote: > > The macros iterate thru all set/clear bits in a bitmap. They search a > first bit using find_first_bit(), and the rest bits using find_next_bit(). > > Since find_next_bit() is called shortly after find_first_bit(), we can > save few lines of I-cache by not using find_first_bit(). Really? > > Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx> > --- > include/linux/find.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/linux/find.h b/include/linux/find.h > index 4500e8ab93e2..ae9ed52b52b8 100644 > --- a/include/linux/find.h > +++ b/include/linux/find.h > @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned > #endif > > #define for_each_set_bit(bit, addr, size) \ > - for ((bit) = find_first_bit((addr), (size)); \ > + for ((bit) = find_next_bit((addr), (size), 0); \ On which architecture do you observe a gain? Only 32bit ARM and m68k implement their own version of find_first_bit(), and everyone else uses the canonical implementation: #ifndef find_first_bit #define find_first_bit(addr, size) find_next_bit((addr), (size), 0) #endif These architectures explicitly have different implementations for find_first_bit() and find_next_bit() because they can do better (whether that is true or not is another debate). I don't think you should remove this optimisation until it has been measured on these two architectures. Thanks, M. -- Without deviation from the norm, progress is not possible.