Am 01.12.2016 um 01:57 schrieb Michel Dänzer: > On 01/12/16 02:52 AM, Jochen Rollwagen wrote: >> Am 29.11.2016 um 08:32 schrieb Michel Dänzer: >>> On 29/11/16 03:18 AM, Jochen Rollwagen wrote: >>>> This commit replaces the loop for calculating log base 2 for >>>> non-x86-platforms in radeon.h with a clz (count leading zeroes)-based >>>> version to simplify the code and, well, eliminate the loop. >>>> Note: Thereâ??s no check for val=0 case, since x86-bsr is undefined for >>>> that case too, that should be okay. >>>> --- >>>> src/radeon.h | 7 +++---- >>>> 1 file changed, 3 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/src/radeon.h b/src/radeon.h >>>> index cbc7866..b1a1ce0 100644 >>>> --- a/src/radeon.h >>>> +++ b/src/radeon.h >>>> @@ -933,17 +933,16 @@ enum { >>>> static __inline__ int >>>> RADEONLog2(int val) >>>> { >>>> - int bits; >>>> #if (defined __i386__ || defined __x86_64__) && (defined __GNUC__) >>>> + int bits; >>>> + >>>> __asm volatile("bsrl %1, %0" >>>> : "=r" (bits) >>>> : "c" (val) >>>> ); >>>> return bits; >>>> #else >>>> - for (bits = 0; val != 0; val >>= 1, ++bits) >>>> - ; >>>> - return bits - 1; >>>> + return (31 - __builtin_clz(val)); >>>> #endif >>>> } >>> Any reason for not using __builtin_clz on x86 as well? AFAICT both gcc >>> and clang seem to generate more or less the same code with that as with >>> the inline assembly. >>> >>> >> I guess not. According to >> http://stackoverflow.com/questions/9353973/implementation-of-builtin-clz >> "bsr and clz are related but different. >> >> On x86 for clz gcc (-O2) generates: >> >> bsrl %edi, %eax >> xorl $31, %eax >> ret " > That's not what I'm seeing. Have you compared the code generated by your > compiler in both cases? > > Well, I'm on a powerpc platform, the generated code looks like this: RADEONLog2: stwu %r1,-16(%r1) cntlzw %r3,%r3 subfic %r3,%r3,31 addi %r1,%r1,16 blr Straightforward enough. But i guess youâ??re right, the generated code is similar enough to justify a general approach without platform-specific inline assembler and leaving the rest to the compiler. I'll post V2 of the patch.