David Bruant wrote:
Finally, I would like to know some reason that could make the code
slower by unrolling loops.
And, maybe, that we (I) could write to the people that write the manual
to add what will be said here to improve the manula, because I find the
"may or may not" quite weak for a manual.
Try a few experiments before jumping to your conclusions. I have, and
unrolling loops usually makes those loops slower.
It is a complicated situation and you may find that the option to unroll
loops makes the total program faster despite making most loops slower (a
few inner loops that took a lot of time might get faster while loops
that took less time get slower). But even that much is far from
certain. The total program might get slower.
Depending on details of compiler behavior that I don't know for gcc,
there might be much stronger reasons than the following for loops to get
slower when unrolled, but the following is sometimes enough:
1) Modern CPUs overlap a lot of work, so all the counting and jumping
involved in a loop might happen to be fully overlapped and free, so
there is nothing to be saved by unrolling.
2) By unrolling, you are always giving the L1 instruction cache more
work to do. Depending on complex issues of the instruction mix and
decode overheads etc. the cost of fetching all those extra instructions
might outweigh everything else. So after saving nothing because of
factor (1) you then pay a lot for it by factor (2).