Re: slowdown with -std=gnu18 with respect to -std=c99

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 6 May 2022, Alexander Monakov wrote:

> The primary issue here is false dependency on vcvtss2sd instruction. In the
> snippet shown in Stéphane's email, the slower variant begins with
> 
>     vcvtss2sd   -0x4(%rsp),%xmm1,%xmm1
> 
> The cvtss2sd instruction is specified to take the upper bits of SSE register
> unmodified, so here it merges high bits of xmm1 with results of float->double
> conversion (in low bits) into new xmm1. Unless the CPU can track dependencies
> separately for vector register components, it has to delay this instruction
> until the previous computation that modified xmm1 has completed (AMD Zen2 is
> an example of a microarchitecture that apparently can).

For future reference, my statement in parenthesis was a bit inaccurate: Zen 2
avoids the false dependency provided that xmm1 carries all-zeroes in high bits
after being idiomatically zeroed (i.e. via pxor). Thanks to Andreas Abel for
pointing out there's a limitation.

(nevertheless, the "blessed" state seemingly survives context switches, so
it's quite useful, including this testcase)

Alexander



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux