Hi Alexander, On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> wrote: > While I was working on converting some structure fields from a fixed > type to a bitmap, I started observing code size increase not only in > places where the code works with the converted structure fields, but > also where the converted vars were on the stack. That said, the > following code: > > DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1]; > unsigned long bar = BIT(BAR_BIT); > unsigned long baz = 0; > > __set_bit(FOO_BIT, foo); > baz |= BIT(BAZ_BIT); > > BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo)); > BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT)); > BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT)); > > triggers the first assertion on x86_64, which means that the > compiler is unable to evaluate it to a compile-time initializer > when the architecture-specific bitop is used even if it's obvious. > I found that this is due to that many architecture-specific > non-atomic bitop implementations use inline asm or other hacks which > are faster or more robust when working with "real" variables (i.e. > fields from the structures etc.), but the compilers have no clue how > to optimize them out when called on compile-time constants. > > So, in order to let the compiler optimize out such cases, expand the > test_bit() and __*_bit() definitions with a compile-time condition > check, so that they will pick the generic C non-atomic bitop > implementations when all of the arguments passed are compile-time > constants, which means that the result will be a compile-time > constant as well and the compiler will produce more efficient and > simple code in 100% cases (no changes when there's at least one > non-compile-time-constant argument). > The condition itself: > > if ( > __builtin_constant_p(nr) && /* <- bit position is constant */ > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is > always either NULL or not */ > addr && /* <- bitmap addr is not NULL */ > __builtin_constant_p(*addr) /* <- compiler knows the value of > the target bitmap */ > ) > /* then pick the generic C variant > else > /* old code path, arch-specific > > I also tried __is_constexpr() as suggested by Andy, but it was > always returning 0 ('not a constant') for the 2,3 and 4th > conditions. > > The savings on x86_64 with LLVM are insane (.text): > > $ scripts/bloat-o-meter -c vmlinux.{base,test} > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885) > > $ scripts/bloat-o-meter -c vmlinux.{base,mod} > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947) > > $ scripts/bloat-o-meter -c vmlinux.{base,all} > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) Thank you! I gave it a try on m68k, and am a bit disappointed seeing an increase in code size: add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092) This is atari_defconfig on a tree based on v5.19-rc1, with m68k-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds