On 01/12/16 02:52 AM, Jochen Rollwagen wrote: > Am 29.11.2016 um 08:32 schrieb Michel Dänzer: >> On 29/11/16 03:18 AM, Jochen Rollwagen wrote: >>> This commit replaces the loop for calculating log base 2 for >>> non-x86-platforms in radeon.h with a clz (count leading zeroes)-based >>> version to simplify the code and, well, eliminate the loop. >>> Note: Thereâ??s no check for val=0 case, since x86-bsr is undefined for >>> that case too, that should be okay. >>> --- >>> src/radeon.h | 7 +++---- >>> 1 file changed, 3 insertions(+), 4 deletions(-) >>> >>> diff --git a/src/radeon.h b/src/radeon.h >>> index cbc7866..b1a1ce0 100644 >>> --- a/src/radeon.h >>> +++ b/src/radeon.h >>> @@ -933,17 +933,16 @@ enum { >>> static __inline__ int >>> RADEONLog2(int val) >>> { >>> - int bits; >>> #if (defined __i386__ || defined __x86_64__) && (defined __GNUC__) >>> + int bits; >>> + >>> __asm volatile("bsrl %1, %0" >>> : "=r" (bits) >>> : "c" (val) >>> ); >>> return bits; >>> #else >>> - for (bits = 0; val != 0; val >>= 1, ++bits) >>> - ; >>> - return bits - 1; >>> + return (31 - __builtin_clz(val)); >>> #endif >>> } >> Any reason for not using __builtin_clz on x86 as well? AFAICT both gcc >> and clang seem to generate more or less the same code with that as with >> the inline assembly. >> >> > I guess not. According to > http://stackoverflow.com/questions/9353973/implementation-of-builtin-clz > "bsr and clz are related but different. > > On x86 for clz gcc (-O2) generates: > > bsrl %edi, %eax > xorl $31, %eax > ret " That's not what I'm seeing. Have you compared the code generated by your compiler in both cases? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer