Re: Why does __builtin_ctz clear eax on amd64 targets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 3, 2017 at 9:59 PM, Mason <slash.tmp@xxxxxxx> wrote:

> On 03/10/2017 19:09, David Wohlferd wrote:
>
> > On 10/3/2017 6:53 AM, Mason wrote:
> >
> >> Consider the following code:
> >>
> >> int my_ctz(unsigned int arg) { return __builtin_ctz(arg); }
> >>
> >> which "gcc-7 -O -S -march=skylake" compiles to:
> >>
> >> my_ctz:
> >>      xorl    %eax, %eax
> >>      tzcntl  %edi, %eax
> >>      ret
> >>
> >> I don't understand why GCC clears eax before executing tzcnt.
> >> (Actually, this happens for other built-ins as well: clz, popcount.)
> >>
> >> tzcnt (or bsf) will write their result to eax.
> >>
> >> http://www.felixcloutier.com/x86/TZCNT.html
> >> http://www.felixcloutier.com/x86/BSF.html
> >>
> >> Does it have to do with partial register write stalls?
> >> Probably not, because the zero-ing remains even when the call
> >> is inlined, and gcc "sees" there are no partial register writes.
> >
> > Quoting from the docs on tzcnt:
> >
> > "in the case of BSF instruction, if source operand is zero, the
> > content of destination operand are undefined. On processors that do
> > not support TZCNT, the instruction byte encoding is executed as BSF."
> >
> > So BSF leaves the contents of eax undefined, and TZCNT might execute as
> > BSF.  Given the trivial nature of xor eax, eax, this seems a sensible
> > precaution.
>
> Hello David,
>
> Your answer makes sense, but falls apart given the following:
>
> As I stated, "gcc-7 -O -S -march=skylake" generates
>
> my_ctz:
>         xorl    %eax, %eax
>         tzcntl  %edi, %eax
>         ret
>
> But "gcc-7 -O -S -march=barcelona" generates
>
> my_ctz:
>         bsfl    %edi, %eax
>         ret
>
>
> AMD Barcelona does not support tzcnt, yet GCC doesn't clear
> eax before executing bsf. The mystery remains :-)
>

It might be because of the workaround for this hardware problem:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011​;


-- 
Regards,
   Mikhail Maltsev




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux