Hi!
Very similar to my post around june 2007 and when Linus Thorvalds posted
6 months later something similar around 2007, i remember one of the GCC
team members showing the middlefinger that they simply wanted to keep intel ahead of
AMD in terms of speed and take care that GCC couldn't rival other
compilers in terms of speed (the implication of not doing this
optimization in branchy codes).
For my chessprogram Diep i've posted even more horrible optimizations -
GCC has the tendency to also put such branches where i know myself that
fall through is gonna give lots of mispredicted branches (as the total
number of branches is too much for the processors memory), GCC managed to
mess up at other pieces even further:
causing it to generate a jump to the end of the function and
then back, and it also was instruction wise outside of the AMD instruction
look ahead - which really is slower than generating a few CMOV type
instructions or using less branches.
Not rewriting this ugly part of the GCC compiler is the reason why intel
c++ is roughly 10-15% faster than GCC, especially in 64 bits, and why
code generated runs faster on intel than on AMD processors
as the instruction lookahead is larger, whereas OBJECTIVELY the code
generated is a lot SLOWER.
Ideally you really want that some statistics generated with whatever there
is at GCC nowadays like -fgenerate, that really every branch can get
parameterized.
Yet a lot of ways to mess up GCC seems to do before such optimizations
can take part.
When it would parameterize that - it would be a compiler that can generate
code that's really objectively fast - whereas it's duck slow right now for
branchy codes.
Kind Regards,
Vincent Diepeveen
The Netherlands
On Tue, 9 Jun 2015, Fisnik Kastrati wrote:
To whom it may concern,
I'm turning to you with regards to an unwanted optimization that g++ (v.
4.8.2) is generating, see the code in the following link:
http://goo.gl/3NVjyc
The assembly code generated for both methods "amp", "ampamp" is the
practically the same, when using the optimization flag "-O3". However, I'm
interested to have a single jump for the code in the method "amp", as branch
misprediction penalty is very high otherwise. Is there any optmization flag
that I should set, in order to avoid this feature when using "-O3"? I.e., I'd
like a generated code similar to icc 13.
Thank you in advance