On 2/24/25 07:24, Uros Bizjak wrote:
On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
Refactor parity calculations to use the standard parity8() helper. This
change eliminates redundant implementations and improves code
efficiency.
The patch improves parity assembly code in bootflag.o from:
58: 89 de mov %ebx,%esi
5a: b9 08 00 00 00 mov $0x8,%ecx
5f: 31 d2 xor %edx,%edx
61: 89 f0 mov %esi,%eax
63: 89 d7 mov %edx,%edi
65: 40 d0 ee shr %sil
68: 83 e0 01 and $0x1,%eax
6b: 31 c2 xor %eax,%edx
6d: 83 e9 01 sub $0x1,%ecx
70: 75 ef jne 61 <sbf_init+0x51>
72: 39 c7 cmp %eax,%edi
74: 74 7f je f5 <sbf_init+0xe5>
76:
to:
54: 89 d8 mov %ebx,%eax
56: ba 96 69 00 00 mov $0x6996,%edx
5b: c0 e8 04 shr $0x4,%al
5e: 31 d8 xor %ebx,%eax
60: 83 e0 0f and $0xf,%eax
63: 0f a3 c2 bt %eax,%edx
66: 73 64 jae cc <sbf_init+0xbc>
68:
which is faster and smaller (-10 bytes) code.
Of course, on x86, parity8() and parity16() can be implemented very simply:
(Also, the parity functions really ought to return bool, and be flagged
__attribute_const__.)
static inline __attribute_const__ bool _arch_parity8(u8 val)
{
bool parity;
asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
return parity;
}
static inline __attribute_const__ bool _arch_parity16(u16 val)
{
bool parity;
asm("xor %h0,%b0" : "=@ccnp" (parity), "+Q" (val));
return parity;
}
In the generic algorithm, you probably should implement parity16() in
terms of parity8(), parity32() in terms of parity16() and so on:
static inline __attribute_const__ bool parity16(u16 val)
{
#ifdef ARCH_HAS_PARITY16
if (!__builtin_const_p(val))
return _arch_parity16(val);
#endif
return parity8(val ^ (val >> 8));
}
This picks up the architectural versions when available.
Furthermore, if a popcnt instruction is known to exist, then the parity
is simply popcnt(x) & 1.
-hpa