Odd performance issue

Bart Grantham <bart@xxxxxxxxxxxxxx> · Mon, 20 Jun 2005 15:27:41 -0400

Summary: depending on the value, the planner will sometimes choose a seq 
scan, sometimes an index scan.  The former produces terrible performace, 
the latter great performance.

The long story: we had a disk failure (NOT the disk the db was on) and 
the machine's system disk had to be rebuilt from the raid array and 
re-GRUB'ed.  Now that the the system is back up we are seeing terrible 
performance (or more accurately, wildly varying performance).  I've 
tried re-importing the data from the live system (this is new hardware 
under testing for the system) and re-initing the db cluster.  A specific 
example is probably best.  This 'connections' table has about 922K 
rows.  The difference here is node_id's 28542 vs. 28560.  Using 28542 
causes an index scan, 28560 causes a seq scan:

The details:
logicops2=> explain SELECT * from connections AS c LEFT JOIN 
connection_types AS ct ON ct.connection_type_id = c.connection_type_id 
WHERE ( connector_node_id = 28542 OR connectee_node_id = 28542 ) AND ( 
c.connection_type_id < 1000 ) LIMIT 300;
                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=1.29..563.05 rows=203 width=116)
  ->  Hash Left Join  (cost=1.29..563.05 rows=203 width=116)
        Hash Cond: ("outer".connection_type_id = 
"inner".connection_type_id)
        ->  Index Scan using c_connector_node_id, c_connectee_node_id 
on connections c  (cost=0.00..558.72 rows=203 width=33)
              Index Cond: ((connector_node_id = 28542) OR 
(connectee_node_id = 28542))
              Filter: (connection_type_id < 1000)
        ->  Hash  (cost=1.23..1.23 rows=23 width=83)
              ->  Seq Scan on connection_types ct  (cost=0.00..1.23 
rows=23 width=83)
(8 rows)

Time: 0.935 ms
logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS 
ct ON ct.connection_type_id = c.connection_type_id WHERE ( 
connector_node_id = 28542 OR connectee_node_id = 28542 ) AND ( 
c.connection_type_id < 1000 ) LIMIT 300;
...results...
(12 rows)

Time: 1.887 ms

-vs-

logicops2=> explain SELECT * from connections AS c LEFT JOIN 
connection_types AS ct ON ct.connection_type_id = c.connection_type_id 
WHERE ( connector_node_id = 28560 OR connectee_node_id = 28560 ) AND ( 
c.connection_type_id < 1000 ) LIMIT 300;
                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Limit  (cost=1.29..686.09 rows=300 width=116)
  ->  Hash Left Join  (cost=1.29..24939.39 rows=10925 width=116)
        Hash Cond: ("outer".connection_type_id = 
"inner".connection_type_id)
        ->  Seq Scan on connections c  (cost=0.00..24774.23 rows=10925 
width=33)
              Filter: (((connector_node_id = 28560) OR 
(connectee_node_id = 28560)) AND (connection_type_id < 1000))
        ->  Hash  (cost=1.23..1.23 rows=23 width=83)
              ->  Seq Scan on connection_types ct  (cost=0.00..1.23 
rows=23 width=83)
(7 rows)

Time: 0.704 ms
logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS 
ct ON ct.connection_type_id = c.connection_type_id WHERE ( 
connector_node_id = 28560 OR connectee_node_id = 28560 ) AND ( 
c.connection_type_id < 1000 ) LIMIT 300;
...results...
(7 rows)

Time: 578.597 ms

... it may be relevant that one node_id has 15 times as many connections:

logicops2=> select count(*) from connections where connector_node_id = 
28542 OR connectee_node_id = 28542;
count
-------
  856
(1 row)

Time: 1.424 ms
logicops2=> select count(*) from connections where connector_node_id = 
28560 OR connectee_node_id = 28560;
count
-------
13500
(1 row)

Time: 559.696 ms

... but that shouldn't make a difference to the planner, should it?  
Yes, I've vacuum analyzed.

Also, I was wondering if someone could correct me on a bit of array 
syntax.  I'd like to have a query pass back an array of ints to a 
function call.  Something like this:
logicops2=> select * from nodes2ancestors(array[(select node_id from 
nodes where node_type_id = 3)]::int[], 0);
ERROR:  more than one row returned by a subquery used as an expression

Thanks for any help/pointers you guys can provide.  I really appreciate 
it as I'm down to the wire on a project and this performance thing has 
really blindsided us.

Bart

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
     joining column's datatypes do not match