Re: Query planner and foreign key constraints

Christian Schröder <cs@xxxxxxxxx> · Tue, 13 Jan 2009 08:23:00 +0100

Christian Schröder wrote:
When I join both tables using key1 and key2 there will be exactly 
1630788 rows because for each row in table2 there *must* exist a row 
in table1. But the query planner doesn't think so:

# explain analyze select * from table1 inner join table2 using (key1, 
key2);
                                                                 QUERY 
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------- 

Merge Join  (cost=0.00..94916.58 rows=39560 width=44) (actual 
time=0.103..7105.960 rows=1630788 loops=1)
  Merge Cond: ((table1.key1 = table2.key1) AND (table1.key2 = 
table2.key2))
  ->  Index Scan using table1_pkey on table1  (cost=0.00..22677.65 
rows=896576 width=20) (actual time=0.038..689.059 rows=896576 loops=1)
  ->  Index Scan using table2_pkey on table2  (cost=0.00..59213.16 
rows=1630788 width=44) (actual time=0.049..1108.220 rows=1630788 loops=1)
Total runtime: 7525.492 ms
(5 rows)
A wrong assumption about the number of rows in a join seems to be one 
major cause for the long running queries that we experience. Here is 
another example (part of a larger query):

->  Hash Join  (cost=18.73..369864.68 rows=33583 width=24) (actual 
time=2.994..9636.586 rows=883068 loops=1)
     Hash Cond: ((b."ID_ISSUER_GROUP" = ic."ID_ISSUER_GROUP") AND 
(substr((b."ISIN")::text, 1, 2) = (ic."ID_COUNTRY")::text))
     ->  Seq Scan on "Z_BASE" b  (cost=0.00..265745.99 rows=883099 
width=20) (actual time=0.048..5380.554 rows=883099 loops=1)
     ->  Hash  (cost=9.89..9.89 rows=589 width=14) (actual 
time=2.793..2.793 rows=589 loops=1)
           ->  Seq Scan on "ISSUER_CODES" ic  (cost=0.00..9.89 rows=589 
width=14) (actual time=0.047..1.151 rows=589 loops=1)

This join has about 25 times more rows than the query planner thinks. In 
my naive thinking, all further planning is simply wild guessing ...

What can I do to address this issue? I tried to create a functional 
index on substr(b."ISIN", 1, 2), but execution time became even worse. 
(I cancelled the query after several minutes.) Is there any way to tell 
the query planner about the (non-)selectivity of the hash condition? 
Would it help to increase the statistics target of one of the columns?

Regards,
   Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general