Re: Linux 5.19-rc8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 26, 2022 at 11:36:21AM -0700, Linus Torvalds wrote:
On Tue, Jul 26, 2022 at 11:18 AM Yury Norov <yury.norov@xxxxxxxxx> wrote:

We have find_bit_benchmark to check how it works in practice. Would
be great if someone with access to the hardware can share numbers.

Honestly, I doubt benchmarking find_bit in a loop is all that sensible.

Yes, that's what I was thinking - I've never seen it crop up in any of
the perf traces I've seen.

Nevertheless, here's some numbers from a single run of the
find_bit_benchmark module, kernel built with:
arm-linux-gnueabihf-gcc (Debian 10.2.1-6) 10.2.1 20210110

Current native implementation:

[   46.184565]
               Start testing find_bit() with random-filled bitmap
[   46.195127] find_next_bit:                 2440833 ns, 163112 iterations
[   46.204226] find_next_zero_bit:            2372128 ns, 164569 iterations
[   46.213152] find_last_bit:                 2199779 ns, 163112 iterations
[   46.299398] find_first_bit:               79526013 ns,  16234 iterations
[   46.684026] find_first_and_bit:          377912990 ns,  32617 iterations
[   46.692020] find_next_and_bit:             1269071 ns,  73562 iterations
[   46.698745]
               Start testing find_bit() with sparse bitmap
[   46.705711] find_next_bit:                  118652 ns,    656 iterations
[   46.716621] find_next_zero_bit:            4183472 ns, 327025 iterations
[   46.723395] find_last_bit:                   50448 ns,    656 iterations
[   46.762308] find_first_bit:               32190802 ns,    656 iterations
[   46.769093] find_first_and_bit:              52129 ns,      1 iterations
[   46.775882] find_next_and_bit:               62522 ns,      1 iterations

Generic implementation:

[   25.149238]
               Start testing find_bit() with random-filled bitmap
[   25.160002] find_next_bit:                 2640943 ns, 163537 iterations
[   25.169567] find_next_zero_bit:            2838485 ns, 164144 iterations
[   25.178595] find_last_bit:                 2302372 ns, 163538 iterations
[   25.204016] find_first_bit:               18697630 ns,  16373 iterations
[   25.602571] find_first_and_bit:          391841480 ns,  32555 iterations
[   25.610563] find_next_and_bit:             1260306 ns,  73587 iterations
[   25.617295]
               Start testing find_bit() with sparse bitmap
[   25.624222] find_next_bit:                   70289 ns,    656 iterations
[   25.636478] find_next_zero_bit:            5527050 ns, 327025 iterations
[   25.643253] find_last_bit:                   52147 ns,    656 iterations
[   25.657304] find_first_bit:                7328573 ns,    656 iterations
[   25.664087] find_first_and_bit:              48518 ns,      1 iterations
[   25.670871] find_next_and_bit:               59750 ns,      1 iterations

Overall, I would say it's pretty similar (some generic perform
marginally better, some native perform marginally better) with the
exception of find_first_bit() being much better with the generic
implementation, but find_next_zero_bit() being noticably worse.

So, pretty much nothing of any relevance between them, which may
come as a surprise given the byte vs word access differences between
the two implementations.

I suspect the reason behind that may be because the native
implementation code is smaller than the generic implementation,
outweighing the effects of the by-byte rather than by-word. I would
also suspect that, because of the smaller implementation, the native
version performs better in a I$-cool situation than the generic. Lastly,
I would suspect if we fixed the bug in the native version, and converted
it to use word loads, it would probably be better than the generic
version. I haven't anything to base that on other than gut feeling at
the moment, but I can make the changes to the native implementation and
see what effect that has, possibly tomorrow.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!



[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux