Re: Best runtime optimization -O2?

John Fine <johnsfine@xxxxxxxxxxx> · Thu, 29 Jan 2009 13:00:12 -0500

In typical applications the execution time is concentrated in critical 
inner loops that represent a tiny fraction of total code size.  Even 
with the expansion in code size from -O3 such loops are likely to fit in 
L1 cache and thus get faster as a result of fairly aggressive time over 
space decisions in the optimizer.

Because of cache effects, almost all the code in a project will get 
slower as a result of time over space optimization choices.  But  making 
the critical inner loops faster may make more difference in total 
execution speed, more than balancing making everything else slower.

I'd be much happier if the optimizer had some choices to be more cache 
conscious (not exactly choose space over speed, but choose speed with 
the understanding that  misses in the instruction cache are likely, so 
smaller code will execute faster).  Even with such options, the coder  
(or  profile guided  optimization,  if you believe  in that)  must  
somehow  tag  the  critical  loops  where cache misses won't dominate 
the performance).

Rainer Gerhards wrote:
I am wondering which optimization options bring offer me the best
runtime performance (speed of execution) on modern hardware.

The traditional thinking of time vs. space optimization is no longer
true due to CPU caches. Often, smaller code is more runtime efficient,
because the cache hit rates are much higher and that outweighs the
negative effects of jumps.