Re: [PATCH v2 01/18] lib/parity: Add __builtin_parity() fallback implementations

Kuan-Wei Chiu <visitorckw@xxxxxxxxx> · Tue, 4 Mar 2025 00:54:30 +0800



On Mon, Mar 03, 2025 at 10:43:28AM -0500, Yury Norov wrote:
> On Mon, Mar 03, 2025 at 10:47:20AM +0800, Kuan-Wei Chiu wrote:
> > > > #define parity(val)					\
> > > > ({							\
> > > > 	__auto_type __v = (val);			\
> > > > 	bool __ret;					\
> > > > 	switch (BITS_PER_TYPE(val)) {			\
> > > > 	case 64:					\
> > > > 		__v ^= __v >> 16 >> 16;			\
> > > > 		fallthrough;				\
> > > > 	case 32:					\
> > > > 		__v ^= __v >> 16;			\
> > > > 		fallthrough;				\
> > > > 	case 16:					\
> > > > 		__v ^= __v >> 8;			\
> > > > 		fallthrough;				\
> > > > 	case 8:						\
> > > > 		__v ^= __v >> 4;			\
> > > > 		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> > > > 		break;					\
> > > > 	default:					\
> > > > 		BUILD_BUG();				\
> > > > 	}						\
> > > > 	__ret;						\
> > > > })
> > > 
> > > I'm seeing double-register shifts for 64bit values on 32bit systems.
> > > And gcc is doing 64bit double-register maths all the way down.
> > > 
> > > That is fixed by changing the top of the define to
> > > #define parity(val)					\
> > > ({							\
> > > 	unsigned int __v = (val);			\
> > > 	bool __ret;					\
> > > 	switch (BITS_PER_TYPE(val)) {			\
> > > 	case 64:					\
> > > 		__v ^= val >> 16 >> 16;			\
> > > 		fallthrough;				\
> > > 
> > > But it's need changing to only expand 'val' once.
> > > Perhaps:
> > > 	auto_type _val = (val);
> > > 	u32 __ret = val;
> > > and (mostly) s/__v/__ret/g
> > >
> > I'm happy to make this change, though I'm a bit confused about how much
> > we care about the code generated by gcc. So this is the macro expected
> > in v3:
> 
> We do care about code generated by any compiler. But we don't spread
> hacks here and there just to make GCC happy. This is entirely broken
> strategy. Things should work the other way: compiler people should
> collect real-life examples and learn from them.
> 
> I'm not happy even with this 'v >> 16 >> 16' hack, I just think that
> disabling Wshift-count-overflow is the worse option. Hacking the macro
> to optimize parity64() on 32-bit arch case doesn't worth it entirely.
> 
> In your patchset, you have only 3 drivers using parity64(). For each
> of them, please in the commit message refer that calling generic
> parity() with 64-bit argument may lead to sub-optimal code generation
> with a certain compiler against 32-bit arches. If you'll get a
> feedback that it's a real problem for somebody, we'll think about
> mitigating it. 
>
How about reconsidering using parity8/16/32/64() instead of adding a
parity() macro? They allow compiler to generate correct code without
any hacks, and each implementation is simple and just one line. Jiri
also agreed in the previous thread that we need parity8() in cases like
the i3c driver. I think this might be the easiest solution to satisfy
most people?

Regards,
Kuan-Wei