On Tue, 3 May 2022, Paul Zimmermann via Gcc-help wrote: > > I can reproduce a difference, but in my case it's simply because in -std=gnuXX > > mode (as opposed to -std=cXX) GCC enables FMA contraction, enabling the last few > > steps in the benchmarked function to use fma instead of separate mul/add > > instructions. > > but then you should get better (i.e. smaller) timings with -std=gnuXX than > with -std=cXX, instead of worse timings as we get? Right, for me -std=gnuXX is faster. But for you it's slower by almost 1.5x, that's quite a lot and should be easy to spot on 'perf report' profile. > > perf record -e instructions:P -c 500000 ./perf ... > > thank you, we'll investigate that. Good luck! I'm curious what you'll find, please let me know. Alexander