On 2017/5/2 14:41, Marc Glisse wrote:
_mm_set_sd is not a NOP, it sets the upper part of the SSE register to
0, which is done with movq in recent versions but through the stack on
older versions. In order to optimize that away, the compiler needs to
know that the upper part of the registers is ignored (it isn't ignored
by max, it is _mm_cvtsd_f64 afterwards that drops anything that depended
on it). But the maxsd operation is largely opaque to the compiler for
now (modeled in an unnaturally complicated way), so it does not notice
it. Clang does a better job there... Feel free to file a bug report at
https://gcc.gnu.org/bugzilla/ if you don't already see a similar one in
the database.
Someone reported it: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70708>
And GCC7 indeed produces better code, despite the fact that it could
have been even better: <https://godbolt.org/g/uBhRDN>
```assembly
my_fmax_1(double, double):
movq xmm1, xmm1
movq xmm0, xmm0
maxsd xmm0, xmm1
ret
my_fmax_2(double, double):
maxsd %xmm1, %xmm0
ret
```
Thank you all the same for the information. I am now keeping an eye on
PR70721.
--
Best regards,
ltpmouse