Re: Why does __builtin_ctz clear eax on amd64 targets

Mason <slash.tmp@xxxxxxx> · Thu, 5 Oct 2017 11:43:50 +0200

On 05/10/2017 01:33, Mikhail Maltsev wrote:

> On Tue, Oct 3, 2017 at 9:59 PM, Mason wrote:
> 
>> On 03/10/2017 19:09, David Wohlferd wrote:
>>
>>> On 10/3/2017 6:53 AM, Mason wrote:
>>>
>>>> Consider the following code:
>>>>
>>>> int my_ctz(unsigned int arg) { return __builtin_ctz(arg); }
>>>>
>>>> which "gcc-7 -O -S -march=skylake" compiles to:
>>>>
>>>> my_ctz:
>>>>      xorl    %eax, %eax
>>>>      tzcntl  %edi, %eax
>>>>      ret
>>>>
>>>> I don't understand why GCC clears eax before executing tzcnt.
>>>> (Actually, this happens for other built-ins as well: clz, popcount.)
>>>>
>>>> tzcnt (or bsf) will write their result to eax.
>>>>
>>>> http://www.felixcloutier.com/x86/TZCNT.html
>>>> http://www.felixcloutier.com/x86/BSF.html
>>>>
>>>> Does it have to do with partial register write stalls?
>>>> Probably not, because the zero-ing remains even when the call
>>>> is inlined, and gcc "sees" there are no partial register writes.
>>>
>>> Quoting from the docs on tzcnt:
>>>
>>> "in the case of BSF instruction, if source operand is zero, the
>>> content of destination operand are undefined. On processors that do
>>> not support TZCNT, the instruction byte encoding is executed as BSF."
>>>
>>> So BSF leaves the contents of eax undefined, and TZCNT might execute as
>>> BSF.  Given the trivial nature of xor eax, eax, this seems a sensible
>>> precaution.
>> 
>> Hello David,
>>
>> Your answer makes sense, but falls apart given the following:
>>
>> As I stated, "gcc-7 -O -S -march=skylake" generates
>>
>> my_ctz:
>>         xorl    %eax, %eax
>>         tzcntl  %edi, %eax
>>         ret
>>
>> But "gcc-7 -O -S -march=barcelona" generates
>>
>> my_ctz:
>>         bsfl    %edi, %eax
>>         ret
>>
>>
>> AMD Barcelona does not support tzcnt, yet GCC doesn't clear
>> eax before executing bsf. The mystery remains :-)
> 
> It might be because of the workaround for this hardware problem:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011

Hello Mikhail,

I think you've hit the nail on the head! :-)

Regards.