On Mon, 3 Mar 2025 10:47:20 +0800 Kuan-Wei Chiu <visitorckw@xxxxxxxxx> wrote: > On Sun, Mar 02, 2025 at 07:09:54PM +0000, David Laight wrote: > > On Mon, 3 Mar 2025 01:29:19 +0800 > > Kuan-Wei Chiu <visitorckw@xxxxxxxxx> wrote: > > > > > Hi Yury, > > > ... > > > #define parity(val) \ > > > ({ \ > > > __auto_type __v = (val); \ > > > bool __ret; \ > > > switch (BITS_PER_TYPE(val)) { \ > > > case 64: \ > > > __v ^= __v >> 16 >> 16; \ > > > fallthrough; \ > > > case 32: \ > > > __v ^= __v >> 16; \ > > > fallthrough; \ > > > case 16: \ > > > __v ^= __v >> 8; \ > > > fallthrough; \ > > > case 8: \ > > > __v ^= __v >> 4; \ > > > __ret = (0x6996 >> (__v & 0xf)) & 1; \ > > > break; \ > > > default: \ > > > BUILD_BUG(); \ > > > } \ > > > __ret; \ > > > }) > > > > I'm seeing double-register shifts for 64bit values on 32bit systems. > > And gcc is doing 64bit double-register maths all the way down. > > > > That is fixed by changing the top of the define to > > #define parity(val) \ > > ({ \ > > unsigned int __v = (val); \ > > bool __ret; \ > > switch (BITS_PER_TYPE(val)) { \ > > case 64: \ > > __v ^= val >> 16 >> 16; \ > > fallthrough; \ > > > > But it's need changing to only expand 'val' once. > > Perhaps: > > auto_type _val = (val); > > u32 __ret = val; > > and (mostly) s/__v/__ret/g > > > I'm happy to make this change, though I'm a bit confused about how much > we care about the code generated by gcc. So this is the macro expected > in v3: There is 'good', 'bad' and 'ugly' - it was in the 'bad' to 'ugly' area. > > #define parity(val) \ > ({ \ > __auto_type __v = (val); \ > u32 __ret = val; \ > switch (BITS_PER_TYPE(val)) { \ > case 64: \ > __ret ^= __v >> 16 >> 16; \ > fallthrough; \ > case 32: \ > __ret ^= __ret >> 16; \ > fallthrough; \ > case 16: \ > __ret ^= __ret >> 8; \ > fallthrough; \ > case 8: \ > __ret ^= __ret >> 4; \ > __ret = (0x6996 >> (__ret & 0xf)) & 1; \ > break; \ > default: \ > BUILD_BUG(); \ > } \ > __ret; \ > }) That looks like it will avoid double-register shifts on 32bit archs. arm64 can do slightly better (a couple of instructions) because of its barrel shifter. x86 can do a lot better because of the cpu 'parity' flag. But maybe it is never used anywhere that really matters. David