Hi Geert, > -----Original Message----- > From: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> > Sent: Friday, November 10, 2023 5:25 PM > To: Wang, Xiao W <xiao.w.wang@xxxxxxxxx> > Cc: paul.walmsley@xxxxxxxxxx; palmer@xxxxxxxxxxx; > aou@xxxxxxxxxxxxxxxxx; ardb@xxxxxxxxxx; anup@xxxxxxxxxxxxxx; Li, Haicheng > <haicheng.li@xxxxxxxxx>; ajones@xxxxxxxxxxxxxxxx; Liu, Yujie > <yujie.liu@xxxxxxxxx>; charlie@xxxxxxxxxxxx; linux-riscv@xxxxxxxxxxxxxxxxxxx; > linux-efi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH v5 2/2] riscv: Optimize bitops with Zbb extension > > Hi Xiao, > > On Tue, Oct 31, 2023 at 7:37 AM Xiao Wang <xiao.w.wang@xxxxxxxxx> wrote: > > This patch leverages the alternative mechanism to dynamically optimize > > bitops (including __ffs, __fls, ffs, fls) with Zbb instructions. When > > Zbb ext is not supported by the runtime CPU, legacy implementation is > > used. If Zbb is supported, then the optimized variants will be selected > > via alternative patching. > > > > The legacy bitops support is taken from the generic C implementation as > > fallback. > > > > If the parameter is a build-time constant, we leverage compiler builtin to > > calculate the result directly, this approach is inspired by x86 bitops > > implementation. > > > > EFI stub runs before the kernel, so alternative mechanism should not be > > used there, this patch introduces a macro NO_ALTERNATIVE for this > purpose. > > > > Signed-off-by: Xiao Wang <xiao.w.wang@xxxxxxxxx> > > Reviewed-by: Charlie Jenkins <charlie@xxxxxxxxxxxx> > > Thanks for your patch, which is now commit 457926b253200bd9 ("riscv: > Optimize bitops with Zbb extension") in riscv/for-next. > > > --- a/arch/riscv/include/asm/bitops.h > > +++ b/arch/riscv/include/asm/bitops.h > > @@ -15,13 +15,261 @@ > > #include <asm/barrier.h> > > #include <asm/bitsperlong.h> > > > > +#if !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) > > #include <asm-generic/bitops/__ffs.h> > > -#include <asm-generic/bitops/ffz.h> > > -#include <asm-generic/bitops/fls.h> > > #include <asm-generic/bitops/__fls.h> > > +#include <asm-generic/bitops/ffs.h> > > +#include <asm-generic/bitops/fls.h> > > + > > +#else > > +#include <asm/alternative-macros.h> > > +#include <asm/hwcap.h> > > + > > +#if (BITS_PER_LONG == 64) > > +#define CTZW "ctzw " > > +#define CLZW "clzw " > > +#elif (BITS_PER_LONG == 32) > > +#define CTZW "ctz " > > +#define CLZW "clz " > > +#else > > +#error "Unexpected BITS_PER_LONG" > > +#endif > > + > > +static __always_inline unsigned long variable__ffs(unsigned long word) > > +{ > > + int num; > > + > > + asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0, > > + RISCV_ISA_EXT_ZBB, 1) > > + : : : : legacy); > > + > > + asm volatile (".option push\n" > > + ".option arch,+zbb\n" > > + "ctz %0, %1\n" > > + ".option pop\n" > > + : "=r" (word) : "r" (word) :); > > + > > + return word; > > + > > +legacy: > > + num = 0; > > +#if BITS_PER_LONG == 64 > > + if ((word & 0xffffffff) == 0) { > > + num += 32; > > + word >>= 32; > > + } > > +#endif > > + if ((word & 0xffff) == 0) { > > + num += 16; > > + word >>= 16; > > + } > > + if ((word & 0xff) == 0) { > > + num += 8; > > + word >>= 8; > > + } > > + if ((word & 0xf) == 0) { > > + num += 4; > > + word >>= 4; > > + } > > + if ((word & 0x3) == 0) { > > + num += 2; > > + word >>= 2; > > + } > > + if ((word & 0x1) == 0) > > + num += 1; > > + return num; > > +} > > Surely we can do better than duplicating include/asm-generic/bitops/__ffs.h? > > E.g. rename the generic implementation to generic___ffs(): > > -static __always_inline unsigned long __ffs(unsigned long word) > +static __always_inline unsigned long generic__ffs(unsigned long word) > { > ... > } > > +#ifndef __ffs > +#define __ffs(x) generic__ffs(x) > +#endif > > and explicitly calling the generic one here? > > Same comment for the other functions. Thanks for the suggestion. I just tried your above example and got build error of "__ffs" redefinition, I think we can change the macro condition to: +#ifndef __HAVE_ARCH___FFS +#define __ffs(word) generic___ffs(word) +#endif Besides, adding a "generic_" prefix to __ffs would make it as "generic___ffs". I saw similar API names in include/asm-generic/bitops/generic-non-atomic.h. I would send a patch for this. BRs, Xiao > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux- > m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds