On 5/7/08, Jason Garrett-Glaser <darkshikari@xxxxxxxxx> wrote: > it doesn't unroll loops that would be highly beneficial to unroll. > for( i = 0; i < 3; i++ ) > { > > for( l=0; l<2; l++ ) > { > for( j=0; j<h->mb.pic.i_fref[l]; j++ ) > { > if( i == 0 ) > for( k = 1; k < 4; k++ ) > } > } > } > > Unrolling this loop saves over 100 clocks out of a total of about 620 > on my system 1) Which loop? I can see three nested ones. 2) man gcc; /unroll; n n n n :) 3) On your particular CPU, maybe, but, on others, aggressive unrolling makes the code suddenly several times slower when the size of the loop exceeds the size of the instruction cache. 4) Rewrite the code to abstract i and l from the inner loops where they are constant, rather than relying on the compiler to spot constants and unroll loops? That would give an advantage on all compilers and processors, rather than shaving 10 or 20% off on your specific platform and compiler.