On Mon, Apr 25, 2022 at 06:39:26AM +0300, Kirill A. Shutemov wrote: [snip] > > +static __always_inline void __set_bit(long nr, volatile unsigned long *addr) Can't we update the existing set_bit function? > +{ > + asm volatile(__ASM_SIZE(bts) " %1,%0" : : "m" (*(volatile long *) addr), Why do we need the cast here? > + "Ir" (nr) : "memory"); Shouldn't we add "cc" to the clobber list? > +} > + > +static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) > +{ > + asm volatile(__ASM_SIZE(btr) " %1,%0" : : "m" (*(volatile long *) addr), > + "Ir" (nr) : "memory"); > +} Same comments of __set_bit apply here (except there is no clear_bit function) [snip] > + > +static __always_inline unsigned long swab(const unsigned long y) > +{ > +#if __BITS_PER_LONG == 64 > + return __builtin_bswap32(y); > +#else /* __BITS_PER_LONG == 32 */ > + return __builtin_bswap64(y); Suppose y = 0x11223344UL, then, the compiler to cast it to a 64 bits value yielding 0x0000000011223344ULL, __builtin_bswap64 will then return 0x4433221100000000, and the return value will be casted back to 32 bits, so swapb will always return 0, won't it? > +#endif > +} > + > +unsigned long _find_next_bit(const unsigned long *addr1, > + const unsigned long *addr2, unsigned long nbits, The addr2 name seems a bit misleading, it seems to act as some kind of mask, is that right? > + unsigned long start, unsigned long invert, unsigned long le) > +{ > + unsigned long tmp, mask; > + > + if (unlikely(start >= nbits)) > + return nbits; > + > + tmp = addr1[start / BITS_PER_LONG]; > + if (addr2) > + tmp &= addr2[start / BITS_PER_LONG]; > + tmp ^= invert; > + > + /* Handle 1st word. */ > + mask = BITMAP_FIRST_WORD_MASK(start); > + if (le) > + mask = swab(mask); > + > + tmp &= mask; > + > + start = round_down(start, BITS_PER_LONG); > + > + while (!tmp) { > + start += BITS_PER_LONG; > + if (start >= nbits) > + return nbits; > + > + tmp = addr1[start / BITS_PER_LONG]; > + if (addr2) > + tmp &= addr2[start / BITS_PER_LONG]; > + tmp ^= invert; > + } Isn't better to divide start by BITS_PER_LONG in the beginning of the fuction, and then multiply it by BITS_PER_LONG when necessary, saving the division operations in the while loop? [snip]