Re: slowdown with -std=gnu18 with respect to -std=c99

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 3 May 2022, Paul Zimmermann via Gcc-help wrote:

> Does anyone have a clue?

I can reproduce a difference, but in my case it's simply because in -std=gnuXX
mode (as opposed to -std=cXX) GCC enables FMA contraction, enabling the last few
steps in the benchmarked function to use fma instead of separate mul/add
instructions.

(regarding __builtin_expect, it also makes a small difference in my case,
it seems GCC generates some redundant code without it, but the difference is
10x smaller than what presence/absence of FMA gives)

I think you might be able to figure it out on your end if you run both variants
under 'perf stat', note how cycle count and instruction counts change, and then
look at disassembly to see what changed. You can use 'perf record' and 'perf
report' to easily see the hot code path; if you do that, I'd recommend to run
it with the same sampling period in both cases, e.g. like this:

    perf record -e instructions:P -c 500000 ./perf ...

Alexander



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux