Xeon twice the performance of opteron

Jeff <threshar@xxxxxxxxxxxxxxxxxxxxxxx> · Thu, 17 Mar 2011 11:13:58 -0400

hey folks,

Running into some odd performance issues between a few of our db  
boxes.  While trying to speed up a query I ran it on another box and  
it was twice as fast.  The plans are identical and various portions of  
the query run in the same amount of time - it all boils down to most  
of the time being spent in a join filter.  The plan is as good as it  
is going to get but the thing that is concerning me, which hopefully  
some folks here may have some insight on, is the very large difference  
in runtime.

three boxes:
	A: Intel(R) Xeon(R) CPU           E5345  @ 2.33GHz  (Runs query  
fastest)
		4MB cache
	B: Quad-Core AMD Opteron(tm) Processor 2352 (2.1GHZ) (Main production  
box, currently, middle speed)
		512k cache
	C: Quad-Core AMD Opteron(tm) Processor 2378 (2.4GHZ)
		512k cache

A & B are running PG 8.4.2 (yes, I know it desperately need to be  
upgraded). C was also on 8.4.2 and since it was not in production I  
upgraded it to 8.4.7 and got the same performance as 8.4.2.  Dataset  
on A & B is the same C is mostly the same, but is missing a couple  
weeks of data (but since this query runs over 3 years of data, it is  
negligable - plus C runs the slowest!)

All three running FC10 with kernel Linux db06  
2.6.27.19-170.2.35.fc10.x86_64 #1 SMP Mon Feb 23 13:00:23 EST 2009  
x86_64 x86_64 x86_64 GNU/Linux

Load is very low on each box. The query is running from shared_buffers  
- no real IO is occuring.

The average timing for the query in question is 90ms on A,  180ms on B  
and 190ms on C.

Now here's where some odd stuff starts piling up: explain analyze  
overhead on said queries:
20ms on A, 50ms on B and 85ms on C(!!)

We had one thought about potential NUMA issues, but doing a series  
(100) of connect, query, disconnect and looking at the timings reveals  
them all to be solid... but even still we wouldn't expect it to be  
that awful.  The smaller cache of the opterons is also a valid argument.

I know we're running an old kernel, I'm tempted to upgrade to see what  
will happen, but at the same time I'm afraid it'll upgrade to a kernel  
with a broken [insert major subsystem here] which has happened before.

Anybody have some insight into this or run into this before?

btw, little more background on the query:

   ->  Nested Loop  (cost=5.87..2763.69 rows=9943 width=0) (actual  
time=0.571..2
74.750 rows=766 loops=1)
         Join Filter: (ce.eventdate >= (md.date - '6 days'::interval))
         ->  Nested Loop  (cost=5.87..1717.98 rows=27 width=8)  
(actual time=0.53
3..8.301 rows=159 loops=1)
		[stuff removed here]
        ->  Index Scan using  xxxxxxx_date_idx on xxxxxx md
(cost=0.00..19.50 rows=1099 width=8) (actual time=0.023..0.729  
rows=951 loops=15
9)
               Index Cond: (ce.eventdate <= md.date)

On all three boxes that inner nestloop completes in about the same  
amount of time - it is that join filter that is causing the pain and  
agony. (If you are noticing the timing differences, that is because  
the numbers above are the actual numbers, not explain analyze).  The  
query is pulling up a rolling window of events that occured on a  
specific date. This query pulls up al the data for a period of time.    
ce.eventdate is indexed, and is used in the outer nestloop.  Thinking  
more about what is going on cache thrashing is certainly a possibility.

the amazing explain analyze overhead is also very curious - we all  
know it adds overhead, but 85ms?  Yow.

--
Jeff Trout <jeff@xxxxxxxxxxxxx>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance