Re: bitwise & optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi!

Very similar to my post around june 2007 and when Linus Thorvalds posted 6 months later something similar around 2007, i remember one of the GCC team members showing the middlefinger that they simply wanted to keep intel ahead of AMD in terms of speed and take care that GCC couldn't rival other compilers in terms of speed (the implication of not doing this optimization in branchy codes).

For my chessprogram Diep i've posted even more horrible optimizations - GCC has the tendency to also put such branches where i know myself that fall through is gonna give lots of mispredicted branches (as the total number of branches is too much for the processors memory), GCC managed to mess up at other pieces even further:

causing it to generate a jump to the end of the function and then back, and it also was instruction wise outside of the AMD instruction look ahead - which really is slower than generating a few CMOV type instructions or using less branches.

Not rewriting this ugly part of the GCC compiler is the reason why intel c++ is roughly 10-15% faster than GCC, especially in 64 bits, and why code generated runs faster on intel than on AMD processors as the instruction lookahead is larger, whereas OBJECTIVELY the code generated is a lot SLOWER.

Ideally you really want that some statistics generated with whatever there is at GCC nowadays like -fgenerate, that really every branch can get parameterized.

Yet a lot of ways to mess up GCC seems to do before such optimizations can take part.

When it would parameterize that - it would be a compiler that can generate code that's really objectively fast - whereas it's duck slow right now for branchy codes.

Kind Regards,

Vincent Diepeveen
The Netherlands


On Tue, 9 Jun 2015, Fisnik Kastrati wrote:

To whom it may concern,

I'm turning to you with regards to an unwanted optimization that g++ (v. 4.8.2) is generating, see the code in the following link:
http://goo.gl/3NVjyc

The assembly code generated for both methods "amp", "ampamp" is the practically the same, when using the optimization flag "-O3". However, I'm interested to have a single jump for the code in the method "amp", as branch misprediction penalty is very high otherwise. Is there any optmization flag that I should set, in order to avoid this feature when using "-O3"? I.e., I'd like a generated code similar to icc 13.


Thank you in advance






[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux