(CC: Alexey Klimov) On Mon, Dec 7, 2020 at 3:25 AM Will Deacon <will@xxxxxxxxxx> wrote: > > On Sat, Dec 05, 2020 at 08:54:06AM -0800, Yury Norov wrote: > > ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't > > enable it in config. It leads to using find_next_bit() which is less > > efficient: > > [...] > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index 1515f6f153a0..2b90ef1f548e 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -106,6 +106,7 @@ config ARM64 > > select GENERIC_CPU_AUTOPROBE > > select GENERIC_CPU_VULNERABILITIES > > select GENERIC_EARLY_IOREMAP > > + select GENERIC_FIND_FIRST_BIT > > Does this actually make any measurable difference? The disassembly with > or without this is _very_ similar for me (clang 11). > > Will On A-53 find_first_bit() is almost twice faster than find_next_bit(), according to lib/find_bit_benchmark. (Thanks to Alexey for testing.) Yury --- Tested-by: Alexey Klimov <aklimov@xxxxxxxxxx> Start testing find_bit() with random-filled bitmap [7126084.864616] find_next_bit: 9653351 ns, 164280 iterations [7126084.881146] find_next_zero_bit: 9591974 ns, 163401 iterations [7126084.893859] find_last_bit: 5778627 ns, 164280 iterations [7126084.948181] find_first_bit: 47389224 ns, 16357 iterations [7126084.958975] find_next_and_bit: 3875849 ns, 73487 iterations [7126084.965884] Start testing find_bit() with sparse bitmap [7126084.973474] find_next_bit: 109879 ns, 655 iterations [7126084.999365] find_next_zero_bit: 18968440 ns, 327026 iterations [7126085.006351] find_last_bit: 80503 ns, 655 iterations [7126085.032315] find_first_bit: 19048193 ns, 655 iterations [7126085.039303] find_next_and_bit: 82628 ns, 1 iterations with enabled GENERIC_FIND_FIRST_BIT: Start testing find_bit() with random-filled bitmap [ 84.095335] find_next_bit: 9600970 ns, 163770 iterations [ 84.111695] find_next_zero_bit: 9613137 ns, 163911 iterations [ 84.124143] find_last_bit: 5713907 ns, 163770 iterations [ 84.158068] find_first_bit: 27193319 ns, 16406 iterations [ 84.168663] find_next_and_bit: 3863814 ns, 73671 iterations [ 84.175392] Start testing find_bit() with sparse bitmap [ 84.182660] find_next_bit: 112334 ns, 656 iterations [ 84.208375] find_next_zero_bit: 18976981 ns, 327025 iterations [ 84.215184] find_last_bit: 79584 ns, 656 iterations [ 84.233005] find_first_bit: 11082437 ns, 656 iterations [ 84.239821] find_next_and_bit: 82209 ns, 1 iterations root@pine:~# cpupower -c all frequency-info | grep asserted current CPU frequency: 648 MHz (asserted by call to hardware) current CPU frequency: 648 MHz (asserted by call to hardware) current CPU frequency: 648 MHz (asserted by call to hardware) current CPU frequency: 648 MHz (asserted by call to hardware) root@pine:~# lscpu Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Vendor ID: ARM Model: 4 Model name: Cortex-A53 Stepping: r0p4 CPU max MHz: 1152.0000 CPU min MHz: 648.0000 BogoMIPS: 48.00 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Not affected Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid