Michael, On 12/11/06 9:31 AM, "Michael Stone" <mstone+postgres@xxxxxxxxx> wrote: > [1] I will say that I have never seen a realistic benchmark of general > code where the compiler flags made a statistically significant > difference in the runtime. Here's one - I wrote a general purpose Computational Fluid Dynamics analysis method used by hundreds of people to perform aircraft and propulsion systems analysis. Compiler flag tuning would speed it up by factors of 2-3 or even more on some architectures. The reason it was so effective is that the structure of the code was designed to be general, but also to expose the critical performance sections in a way that the compilers could use - deep pipelining/vectorization, unrolling, etc, were carefully made easy for the compilers to exploit in critical sections. Yes, this made the code in those sections harder to read, but it was a common practice because it might take weeks of runtime to get an answer and performance mattered. The problem I see with general purpose DBMS code the way it's structured in pgsql (and others) is that many of the critical performance sections are embedded in abstract interfaces that obscure them from optimization. An example is doing a simple "is equal to" operation has many layers surrounding it to ensure that UDFs can be declared and that special comparison semantics can be accomodated. But if you're simply performing a large number of INT vs. INT comparisons, it will be thousands of times slower than a CPU native operation because of the function call overhead, etc. I've seen presentations that show IPC of Postgres at about 0.5, versus the 2-4 possible from the CPU. Column databases like C-Store remove these abstractions at planner time to expose native operations in large chunks to the compiler and the IPC reflects that - typically 1+ and as high as 2.5. If we were to redesign the executor and planner to emulate that same structure we could achieve similar speedups and the compiler would matter more. - Luke