At 10:55 AM 12/15/2006, Merlin Moncure wrote:
On 12/15/06, Ron <rjpeace@xxxxxxxxxxxxx> wrote:
There are many instances of x86 compatible code that get
30-40% speedups just because they get access to 16 rather than 8 GPRs
when recompiled for x84-64.
...We benchmarked PostgreSQL internally here and found it to be
fastest in 32 bit mode running on a 64 bit platform -- this was on a
quad opteron 870 runnning our specific software stack, your results
might be differnt of course.
On AMD Kx's, you probably will get best performance in 64b mode (so
you get all those extra registers and other stuff) while using 32b
pointers (to keep code size and memory footprint down).
On Intel C2's, things are more complicated since Intel's x86-64
implementation and memory IO architecture are both different enough
from AMD's to have caused some consternation on occasion when Intel's
64b performance did not match AMD's.
The big arch specific differences in Kx's are in 64b mode. Not 32b
I dont think so. IMO all the processor specific instruction sets were
hacks of 32 bit mode to optimize specific tasks. Except for certain
things these instructions are rarely, if ever used in 64 bit mode,
especially in integer math (i.e. database binaries). Since Intel and
AMD64 64 bit are virtually indentical I submit that -march is not
really important anymore except for very, very specific (but
important) cases like spinlocks.
Take a good look at the processor specific manuals and the x86-64
benches around the net. The evidence outside the DBMS domain is
pretty solidly in contrast to your statement and
position. Admittedly, DBMS are not web servers or FPS games or ...
That's why we need to do our own rigorous study of the subject.
This thread is about how much architecture depenant binares can beat
standard ones. I say they don't very much at all, and with the
specific exception of Daniel's
benchmarking the results posted to this list bear that out.
...and IMHO the issue is still very much undecided given that we
don't have enough properly obtained and documented evidence.
ATM, the most we can say is that in a number of systems with modest
physical IO subsystems that are not running Gentoo Linux we have not
been able to duplicate the results. (We've also gotten some
interesting results like yours suggesting the arch specific
optimizations are bad for pg performance in your environment.)
In the process questions have been asked and issues raised regarding
both the tolls involved and the proper way to use them.
We really do need to have someone other than Daniel duplicate his
Gentoo environment and independently try to duplicate his results.
...and let us bear in mind that this is not just intellectual
curiosity. The less pg is mysterious, the better the odds pg will be
adopted in any specific case.
Ron Peacetree