Re: Best runtime optimization -O2?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In typical applications the execution time is concentrated in critical inner loops that represent a tiny fraction of total code size. Even with the expansion in code size from -O3 such loops are likely to fit in L1 cache and thus get faster as a result of fairly aggressive time over space decisions in the optimizer.

Because of cache effects, almost all the code in a project will get slower as a result of time over space optimization choices. But making the critical inner loops faster may make more difference in total execution speed, more than balancing making everything else slower.

I'd be much happier if the optimizer had some choices to be more cache conscious (not exactly choose space over speed, but choose speed with the understanding that misses in the instruction cache are likely, so smaller code will execute faster). Even with such options, the coder (or profile guided optimization, if you believe in that) must somehow tag the critical loops where cache misses won't dominate the performance).

Rainer Gerhards wrote:
I am wondering which optimization options bring offer me the best
runtime performance (speed of execution) on modern hardware.

The traditional thinking of time vs. space optimization is no longer
true due to CPU caches. Often, smaller code is more runtime efficient,
because the cache hit rates are much higher and that outweighs the
negative effects of jumps.



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux