From: Yury Norov <yury.norov@xxxxxxxxx> Date: Wed, 24 Feb 2021 07:44:16 -0800 > On Wed, Feb 24, 2021 at 11:52:55AM +0000, Alexander Lobakin wrote: > > From: Yury Norov <yury.norov@xxxxxxxxx> > > Date: Sat, 5 Dec 2020 08:54:06 -0800 > > > > Hi, > > > > > ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't > > > enable it in config. It leads to using find_next_bit() which is less > > > efficient: > > > > > > 0000000000000000 <find_first_bit>: > > > 0: aa0003e4 mov x4, x0 > > > 4: aa0103e0 mov x0, x1 > > > 8: b4000181 cbz x1, 38 <find_first_bit+0x38> > > > c: f9400083 ldr x3, [x4] > > > 10: d2800802 mov x2, #0x40 // #64 > > > 14: 91002084 add x4, x4, #0x8 > > > 18: b40000c3 cbz x3, 30 <find_first_bit+0x30> > > > 1c: 14000008 b 3c <find_first_bit+0x3c> > > > 20: f8408483 ldr x3, [x4], #8 > > > 24: 91010045 add x5, x2, #0x40 > > > 28: b50000c3 cbnz x3, 40 <find_first_bit+0x40> > > > 2c: aa0503e2 mov x2, x5 > > > 30: eb02001f cmp x0, x2 > > > 34: 54ffff68 b.hi 20 <find_first_bit+0x20> // b.pmore > > > 38: d65f03c0 ret > > > 3c: d2800002 mov x2, #0x0 // #0 > > > 40: dac00063 rbit x3, x3 > > > 44: dac01063 clz x3, x3 > > > 48: 8b020062 add x2, x3, x2 > > > 4c: eb02001f cmp x0, x2 > > > 50: 9a829000 csel x0, x0, x2, ls // ls = plast > > > 54: d65f03c0 ret > > > > > > ... > > > > > > 0000000000000118 <_find_next_bit.constprop.1>: > > > 118: eb02007f cmp x3, x2 > > > 11c: 540002e2 b.cs 178 <_find_next_bit.constprop.1+0x60> // b.hs, b.nlast > > > 120: d346fc66 lsr x6, x3, #6 > > > 124: f8667805 ldr x5, [x0, x6, lsl #3] > > > 128: b4000061 cbz x1, 134 <_find_next_bit.constprop.1+0x1c> > > > 12c: f8667826 ldr x6, [x1, x6, lsl #3] > > > 130: 8a0600a5 and x5, x5, x6 > > > 134: ca0400a6 eor x6, x5, x4 > > > 138: 92800005 mov x5, #0xffffffffffffffff // #-1 > > > 13c: 9ac320a5 lsl x5, x5, x3 > > > 140: 927ae463 and x3, x3, #0xffffffffffffffc0 > > > 144: ea0600a5 ands x5, x5, x6 > > > 148: 54000120 b.eq 16c <_find_next_bit.constprop.1+0x54> // b.none > > > 14c: 1400000e b 184 <_find_next_bit.constprop.1+0x6c> > > > 150: d346fc66 lsr x6, x3, #6 > > > 154: f8667805 ldr x5, [x0, x6, lsl #3] > > > 158: b4000061 cbz x1, 164 <_find_next_bit.constprop.1+0x4c> > > > 15c: f8667826 ldr x6, [x1, x6, lsl #3] > > > 160: 8a0600a5 and x5, x5, x6 > > > 164: eb05009f cmp x4, x5 > > > 168: 540000c1 b.ne 180 <_find_next_bit.constprop.1+0x68> // b.any > > > 16c: 91010063 add x3, x3, #0x40 > > > 170: eb03005f cmp x2, x3 > > > 174: 54fffee8 b.hi 150 <_find_next_bit.constprop.1+0x38> // b.pmore > > > 178: aa0203e0 mov x0, x2 > > > 17c: d65f03c0 ret > > > 180: ca050085 eor x5, x4, x5 > > > 184: dac000a5 rbit x5, x5 > > > 188: dac010a5 clz x5, x5 > > > 18c: 8b0300a3 add x3, x5, x3 > > > 190: eb03005f cmp x2, x3 > > > 194: 9a839042 csel x2, x2, x3, ls // ls = plast > > > 198: aa0203e0 mov x0, x2 > > > 19c: d65f03c0 ret > > > > > > ... > > > > > > 0000000000000238 <find_next_bit>: > > > 238: a9bf7bfd stp x29, x30, [sp, #-16]! > > > 23c: aa0203e3 mov x3, x2 > > > 240: d2800004 mov x4, #0x0 // #0 > > > 244: aa0103e2 mov x2, x1 > > > 248: 910003fd mov x29, sp > > > 24c: d2800001 mov x1, #0x0 // #0 > > > 250: 97ffffb2 bl 118 <_find_next_bit.constprop.1> > > > 254: a8c17bfd ldp x29, x30, [sp], #16 > > > 258: d65f03c0 ret > > > > > > Enabling this functions would also benefit for_each_{set,clear}_bit(). > > > Would it make sense to enable this config for all such architectures by > > > default? > > > > I confirm that GENERIC_FIND_FIRST_BIT also produces more optimized and > > fast code on MIPS (32 R2) where there is also no architecture-specific > > bitsearching routines. > > So, if it's okay for other folks, I'd suggest to go for it and enable > > for all similar arches. > > As far as I understand the idea of GENERIC_FIND_FIRST_BIT=n, it's > intended to save some space in .text. But in fact it bloats the > kernel: > > yury:linux$ scripts/bloat-o-meter vmlinux vmlinux.ffb > add/remove: 4/1 grow/shrink: 19/251 up/down: 564/-1692 (-1128) > ... Same for MIPS, enabling GENERIC_FIND_FIRST_BIT saves a bunch of .text memory despite that it introduces new entries. > For the next cycle, I'm going to submit a patch that removes the > GENERIC_FIND_FIRST_BIT completely and forces all architectures to > use find_first{_zero}_bit() I like that idea. I'm almost sure there'll be no arch that benefits from CONFIG_GENERIC_FIND_FIRST_BIT=n (and has no arch-optimized versions). > > (otherwise, I'll publish a separate entry for mips-next after 5.12-rc1 > > release and mention you in "Suggested-by:") > > I think it worth to enable GENERIC_FIND_FIRST_BIT for mips and arm now > and see how it works for people. If there'll be no complains I'll remove > the config entirely. I'm OK if you submit the patch for mips now, or we > can make a series and submit together. Works either way. Lez make a series and see how it goes. I'll send you MIPS part soon. Al