Re: New to PostgreSQL, performance considerations

Ron <rjpeace@xxxxxxxxxxxxx> · Fri, 15 Dec 2006 11:53:28 -0500

At 09:50 AM 12/15/2006, Greg Smith wrote:
On Fri, 15 Dec 2006, Merlin Moncure wrote:

The slower is probably due to the unroll loops switch which can 
actually hurt code due to the larger footprint (less cache coherency).

The cache issues are so important with current processors that I'd 
suggest throwing -Os (optimize for size) into the mix people 
test.  That one may stack usefully with -O2, but probably not with 
-O3 (3 includes optimizations that increase code size).

-Os
Optimize for size. -Os enables all -O2 optimizations that do not 
typically increase code size. It also performs further optimizations 
designed to reduce code size.

-Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels
-freorder-blocks -freorder-blocks-and-partition
-fprefetch-loop-arrays
-ftree-vect-loop-version

Hmmm.  That list of disabled flags bears thought.

-falign-functions -falign-jumps -falign-loops -falign-labels

1= Most RISC CPUs performance is very sensitive to misalignment 
issues.  Not recommended to turn these off.

-freorder-blocks
Reorder basic blocks in the compiled function in order to reduce 
number of taken branches and improve code locality.

Enabled at levels -O2, -O3.
-freorder-blocks-and-partition
In addition to reordering basic blocks in the compiled function, in 
order to reduce number of taken branches, partitions hot and cold 
basic blocks into separate sections of the assembly and .o files, to 
improve paging and cache locality performance.

This optimization is automatically turned off in the presence of 
exception handling, for link once sections, for functions with a 
user-defined section attribute and on any architecture that does not 
support named sections.

2= Most RISC CPUs are cranky about branchy code and (lack of) cache 
locality.  Wouldn't suggest punting these either.

-fprefetch-loop-arrays
If supported by the target machine, generate instructions to prefetch 
memory to improve the performance of loops that access large arrays.

This option may generate better or worse code; results are highly 
dependent on the structure of loops within the source code.

3= OTOH, This one looks worth experimenting with turning off.

-ftree-vect-loop-version
Perform loop versioning when doing loop vectorization on trees. When 
a loop appears to be vectorizable except that data alignment or data 
dependence cannot be determined at compile time then vectorized and 
non-vectorized versions of the loop are generated along with runtime 
checks for alignment or dependence to control which version is 
executed. This option is enabled by default except at level -Os where 
it is disabled.

4= ...and this one looks like a 50/50 shot.

Ron Peacetree