Am 29.11.2016 um 08:32 schrieb Michel Dänzer: > On 29/11/16 03:18 AM, Jochen Rollwagen wrote: >> This commit replaces the loop for calculating log base 2 for >> non-x86-platforms in radeon.h with a clz (count leading zeroes)-based >> version to simplify the code and, well, eliminate the loop. >> Note: Thereâ??s no check for val=0 case, since x86-bsr is undefined for >> that case too, that should be okay. >> --- >> src/radeon.h | 7 +++---- >> 1 file changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/src/radeon.h b/src/radeon.h >> index cbc7866..b1a1ce0 100644 >> --- a/src/radeon.h >> +++ b/src/radeon.h >> @@ -933,17 +933,16 @@ enum { >> static __inline__ int >> RADEONLog2(int val) >> { >> - int bits; >> #if (defined __i386__ || defined __x86_64__) && (defined __GNUC__) >> + int bits; >> + >> __asm volatile("bsrl %1, %0" >> : "=r" (bits) >> : "c" (val) >> ); >> return bits; >> #else >> - for (bits = 0; val != 0; val >>= 1, ++bits) >> - ; >> - return bits - 1; >> + return (31 - __builtin_clz(val)); >> #endif >> } > Any reason for not using __builtin_clz on x86 as well? AFAICT both gcc > and clang seem to generate more or less the same code with that as with > the inline assembly. > > I guess not. According to http://stackoverflow.com/questions/9353973/implementation-of-builtin-clz "bsr and clz are related but different. On x86 for clz gcc (-O2) generates: |bsrl %edi, %eax xorl $31, %eax ret " | -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161130/a5796b5b/attachment.html>