Hi again :)
This is a follow-up to the mega thread which made a Friday night more
interesting [1] - the summary is various people thought there was some
issue with shared memory access on AIX.
I then installed Debian (kernel 2.6.11) on the 8-CPU p650 (native - no
LPAR) and saw just as woeful performance.
Now I've had a chance to try a 2-CPU dualcore Opteron box, and it
*FLIES* - the 4-way machine sits churning through our heavy
'hotelsearch' function at ~400ms per call.
Basically, this pSeries box is available until Monday lunchtime if any
pg devel wants to pop in, run tests, mess around since I am convinced
that the hardware itself cannot be this poor - it has to be some failing
of pg when mixed with our dataset / load pattern.
e.g. If I run 'ab -n 200 -c 4 -k http://localhost/test.php [2] with
pg_connect pointed at the pSeries, it turns in search times of ~3500ms
with loadavg of 4.
The same test with pg_connect pointed at the dual-Opteron turns in
~300ms searches, with loadavg of 3.5 .. something is very very wrong
with the pSeries setup :)
If I crank up the heat and run apachebench with 10 hammering clients
instead of 4, the differences become even more stark.. pSeries:
5000-15000ms, loadavg 9.. Opteron ~3000ms, loadavg 8. 90% of queries on
the Opteron conclude in under 4000ms, which maxes out at 6.5 searches
per second. The pSeries manages 0.9 searches per second. (!)
Databases on both machines have seen a VACUUM FULL and VACUUM ANALYZE
before testing, and have near-identical postgresql.conf's. (the pSeries
has twice the RAM)
This post is not intended to be whining that 'pg is crap on pSeries!' -
I'm trying to make a resource available (albeit for a short time) to
help fix a problem that will doubtless affect others in future - for
certain we're never going midrange again! :O
Cheers,
Gavin.
[1] http://archives.postgresql.org/pgsql-performance/2006-04/msg00143.php
[2] Trivial script which does a pg_connect, runs a random hotelsearch
and exits.