At 09:50 AM 12/15/2006, Greg Smith wrote:
On Fri, 15 Dec 2006, Merlin Moncure wrote:
The slower is probably due to the unroll loops switch which can
actually hurt code due to the larger footprint (less cache coherency).
The cache issues are so important with current processors that I'd
suggest throwing -Os (optimize for size) into the mix people
test. That one may stack usefully with -O2, but probably not with
-O3 (3 includes optimizations that increase code size).
-Os
Optimize for size. -Os enables all -O2 optimizations that do not
typically increase code size. It also performs further optimizations
designed to reduce code size.
-Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels
-freorder-blocks -freorder-blocks-and-partition
-fprefetch-loop-arrays
-ftree-vect-loop-version
Hmmm. That list of disabled flags bears thought.
-falign-functions -falign-jumps -falign-loops -falign-labels
1= Most RISC CPUs performance is very sensitive to misalignment
issues. Not recommended to turn these off.
-freorder-blocks
Reorder basic blocks in the compiled function in order to reduce
number of taken branches and improve code locality.
Enabled at levels -O2, -O3.
-freorder-blocks-and-partition
In addition to reordering basic blocks in the compiled function, in
order to reduce number of taken branches, partitions hot and cold
basic blocks into separate sections of the assembly and .o files, to
improve paging and cache locality performance.
This optimization is automatically turned off in the presence of
exception handling, for link once sections, for functions with a
user-defined section attribute and on any architecture that does not
support named sections.
2= Most RISC CPUs are cranky about branchy code and (lack of) cache
locality. Wouldn't suggest punting these either.
-fprefetch-loop-arrays
If supported by the target machine, generate instructions to prefetch
memory to improve the performance of loops that access large arrays.
This option may generate better or worse code; results are highly
dependent on the structure of loops within the source code.
3= OTOH, This one looks worth experimenting with turning off.
-ftree-vect-loop-version
Perform loop versioning when doing loop vectorization on trees. When
a loop appears to be vectorizable except that data alignment or data
dependence cannot be determined at compile time then vectorized and
non-vectorized versions of the loop are generated along with runtime
checks for alignment or dependence to control which version is
executed. This option is enabled by default except at level -Os where
it is disabled.
4= ...and this one looks like a 50/50 shot.
Ron Peacetree