On Tue, Oct 17, 2017 at 2:45 PM, Paolo Valente <paolo.valente@xxxxxxxxxx> wrote: > one of the most time-consuming operations needed by some blkg_*stats_* > functions is, e.g., find_next_bit, for which we don't see any trivial > replacement. So this is one of the things that often falls down to a per-arch assembly optimization, c.f. arch/arm/include/asm/bitops.h On x86 I can't see any assembly optimization of this, so the generic routines in lib/find_bit.c are used AFAICT. This might be a silly question, but If you are testing this on x86, do you think it would help if someone stepped in and slapped in some optimized assembly for those functions? (I guess that is like saying, "instead of a trivial replacement, what about a really complicated one"?) A simple git log arch/x86/include/asm/bitops.h doesn't show any traces of anyone trying to optimize those for x86. I paged in the x86 assembly people, they definately knows whether that is a good idea or if it sucks. (And if it was done in the past.) Yours, Linus Walleij