Re: Linux 5.19-rc8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 26, 2022 at 01:20:23PM -0700, Linus Torvalds wrote:
On Tue, Jul 26, 2022 at 12:44 PM Russell King (Oracle)
<linux@xxxxxxxxxxxxxxx> wrote:

Overall, I would say it's pretty similar (some generic perform
marginally better, some native perform marginally better) with the
exception of find_first_bit() being much better with the generic
implementation, but find_next_zero_bit() being noticably worse.

The generic _find_first_bit() code is actually sane and simple. It
loops over words until it finds a non-zero one, and then does trivial
calculations on that last word.

That explains why the generic code does so much better than your byte-wise asm.

In contrast, the generic _find_next_bit() I find almost offensively
silly - which in turn explains why your byte-wide asm does better.

I think the generic _find_next_bit() should actually do what the m68k
find_next_bit code does: handle the first special word itself, and
then just call find_first_bit() on the rest of it.

And it should *not* try to handle the dynamic "bswap and/or bit sense
invert" thing at all. That should be just four different (trivial)
cases for the first word.

Here's the results for the native version converted to use word loads:

[   37.319937]
               Start testing find_bit() with random-filled bitmap
[   37.330289] find_next_bit:                 2222703 ns, 163781 iterations
[   37.339186] find_next_zero_bit:            2154375 ns, 163900 iterations
[   37.348118] find_last_bit:                 2208104 ns, 163780 iterations
[   37.372564] find_first_bit:               17722203 ns,  16370 iterations
[   37.737415] find_first_and_bit:          358135191 ns,  32453 iterations
[   37.745420] find_next_and_bit:             1280537 ns,  73644 iterations
[   37.752143]
               Start testing find_bit() with sparse bitmap
[   37.759032] find_next_bit:                   41256 ns,    655 iterations
[   37.769905] find_next_zero_bit:            4148410 ns, 327026 iterations
[   37.776675] find_last_bit:                   48742 ns,    655 iterations
[   37.790961] find_first_bit:                7562371 ns,    655 iterations
[   37.797743] find_first_and_bit:              47366 ns,      1 iterations
[   37.804527] find_next_and_bit:               59924 ns,      1 iterations

which is generally faster than the generic version, with the exception
of the sparse find_first_bit (generic was:
[   25.657304] find_first_bit:                7328573 ns,    656 iterations)

find_next_{,zero_}bit() in the sparse case are quite a bit faster than
the generic code.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!



[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux