Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark[0]):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Hmm... with *this version* of the series, I'm not getting results nearly as
good as that when building defconfig atop v5.19-rc3:

  GCC 8.5.0:   add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
  GCC 9.3.0:   add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
  GCC 10.3.0:  add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
  GCC 11.1.0:  add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
  GCC 11.3.0:  add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
  GCC 12.1.0:  add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)

  LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
  LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
  LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)

... and that now seems to be regressing codegen with recent versions of GCC as
much as it improves it LLVM.

I'm not sure if we've improved some other code and removed the benefit between
v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
look as compelling as it did.

Overall that's mostly hidden in the Image size, due to 64K alignment and
padding requirements:

  Toolchain      Before      After       Difference

  GCC 8.5.0      36178432    36178432    0
  GCC 9.3.0      36112896    36112896    0
  GCC 10.3.0     36442624    36377088    -65536
  GCC 11.1.0     36311552    36377088    +65536
  GCC 11.3.0     36311552    36311552    0
  GCC 12.1.0     36377088    36377088    0

  LLVM 12.0.1    31418880    31418880    0
  LLVM 13.0.1    31418880    31418880    0
  LLVM 14.0.0    31218176    31218176    0

... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
change overall (due to 64KiB alignment restrictions for portions of the kernel
Image).

Thanks,
Mark.



[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux