On Sat, Jun 19, 2021 at 05:24:15PM +0100, Marc Zyngier wrote: > On Fri, 18 Jun 2021 20:57:34 +0100, > Yury Norov <yury.norov@xxxxxxxxx> wrote: > > > > The macros iterate thru all set/clear bits in a bitmap. They search a > > first bit using find_first_bit(), and the rest bits using find_next_bit(). > > > > Since find_next_bit() is called shortly after find_first_bit(), we can > > save few lines of I-cache by not using find_first_bit(). > > Really? > > > > > Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx> > > --- > > include/linux/find.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/find.h b/include/linux/find.h > > index 4500e8ab93e2..ae9ed52b52b8 100644 > > --- a/include/linux/find.h > > +++ b/include/linux/find.h > > @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned > > #endif > > > > #define for_each_set_bit(bit, addr, size) \ > > - for ((bit) = find_first_bit((addr), (size)); \ > > + for ((bit) = find_next_bit((addr), (size), 0); \ > > On which architecture do you observe a gain? Only 32bit ARM and m68k > implement their own version of find_first_bit(), and everyone else > uses the canonical implementation: And those who enable GENERIC_FIND_FIRST_BIT - x86, arm64, arc, mips and s390. > #ifndef find_first_bit > #define find_first_bit(addr, size) find_next_bit((addr), (size), 0) > #endif > > These architectures explicitly have different implementations for > find_first_bit() and find_next_bit() because they can do better > (whether that is true or not is another debate). I don't think you > should remove this optimisation until it has been measured on these > two architectures. This patch is based on a series that enables separate implementation of find_first_bit() for all architectures; according to my tests, find_first* is ~ twice faster than find_next* on arm64 and x86. https://lore.kernel.org/lkml/20210612123639.329047-1-yury.norov@xxxxxxxxx/T/#t After applying the series, I noticed that my small kernel module that calls for_each_set_bit() is now using find_first_bit() to just find one bit, and find_next_bit() for all others. I think it's better to always use find_next_bit() in this case to minimize the chance of cache miss. But if it's not that obvious, I'll try to write some test.