On Fri, 07 Apr 2006 14:41:39 -0400 Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Gavin Hamill <gdh@xxxxxxxxxxxxx> writes: > > OK, having moved from our quad-xeon to an 8-CPU IBM pSeries 650 > > (8x1.45GHz POWER4 instead of 4 x 3GHz Xeon), our query times have shot > > up and our website is next to unusable. The IBM is not swapping (not > > with 16GB of RAM!), disk i/o is low, but there must be something > > critically wrong for this monster to be performing so badly.. > > Have you vacuumed/analyzed since reloading your data? Absolutely - a VACUUM FULL was the first thing I did, and have VACUUM ANALYZE VERBOSE'd a couple of times since. I have plenty of overhead to keep the entire free space map in RAM. > Compare some > EXPLAIN ANALYZE outputs for identical queries on the two machines, > that usually helps figure out what's wrong. If only :) Since 90% of the db work is the 'hotelsearch' function (which is 350 lines-worth that I'm not permitted to share :(( ), an EXPLAIN ANALYZE reveals practically nothing: ##### jalapeno (IBM) laterooms=# EXPLAIN ANALYZE select * from hotelsearch(12.48333::numeric, 41.90000::numeric, 5::int4, '2006-04-13'::date, 5::int4, NULL::int4, 1::int4, NULL::int4, NULL::int4, TRUE::bool, FALSE::bool, FALSE::bool, 1::int2, 'GBP'::text, 'Roma'::text, 7::int4, NULL::int4, NULL::int4) limit 500; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..6.25 rows=500 width=1587) (actual time=2922.282..2922.908 rows=255 loops=1) -> Function Scan on hotelsearch (cost=0.00..12.50 rows=1000 width=1587) (actual time=2922.277..2922.494 rows=255 loops=1) Total runtime: 2923.296 ms (3 rows) ##### cayenne (xeon) laterooms=# EXPLAIN ANALYZE select * from hotelsearch(12.48333::numeric, 41.90000::numeric, 5::int4, '2006-04-13'::date, 5::int4, NULL::int4, 1::int4, NULL::int4, NULL::int4, TRUE::bool, FALSE::bool, FALSE::bool, 1::int2, 'GBP'::text, 'Roma'::text, 7::int4, NULL::int4, NULL::int4) limit 500; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..6.25 rows=500 width=1587) (actual time=1929.483..1930.103 rows=255 loops=1) -> Function Scan on hotelsearch (cost=0.00..12.50 rows=1000 width=1587) (actual time=1929.479..1929.693 rows=255 loops=1) Total runtime: 1930.506 ms (3 rows) The 'LIMIT 500' is a red herring since the function body will get all data, so reducing the LIMIT in the call to hotelsearch doesn't reduce the amount of work being done. The killer in it all is tail'ing the postgres log (which I have set only to log queries at 1000ms or up) is things will be returning at 1000-2000ms.. then suddenly shoot up to 8000ms.. and if I try a few of those 8000ms queries on the xeon box, they exec in ~1500ms.. and if I try them again a few moments later on the ibm, they'll also exec in maybe ~2500ms. This is one hell of a moving target and I can't help but think I'm just missing something that's right in front of my nose, too close to see. Cheers, Gavin.