On Tue, Jul 26, 2022 at 12:44 PM Russell King (Oracle) <linux@xxxxxxxxxxxxxxx> wrote:
Overall, I would say it's pretty similar (some generic perform marginally better, some native perform marginally better) with the exception of find_first_bit() being much better with the generic implementation, but find_next_zero_bit() being noticably worse.
The generic _find_first_bit() code is actually sane and simple. It loops over words until it finds a non-zero one, and then does trivial calculations on that last word. That explains why the generic code does so much better than your byte-wise asm. In contrast, the generic _find_next_bit() I find almost offensively silly - which in turn explains why your byte-wide asm does better. I think the generic _find_next_bit() should actually do what the m68k find_next_bit code does: handle the first special word itself, and then just call find_first_bit() on the rest of it. And it should *not* try to handle the dynamic "bswap and/or bit sense invert" thing at all. That should be just four different (trivial) cases for the first word. Linus