Hi list,
I have the following query:
SELECT *
FROM base
INNER JOIN pt USING (x) WHERE pt.y IN ('1121', '11411', '11421',
'1161', '1162');
"explain analyze" yields the following result:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=529.18..164930.70 rows=28374 width=0) (actual
time=10.834..4877.326 rows=245298 loops=1)
Hash Cond: (b.x = pt.x)
-> Seq Scan on b (cost=0.00..159579.93 rows=1210093 width=4)
(actual time=0.018..2464.871 rows=1210250 loops=1)
-> Hash (cost=527.41..527.41 rows=142 width=4) (actual
time=10.775..10.775 rows=138 loops=1)
-> Seq Scan on pt (cost=0.00..527.41 rows=142 width=4)
(actual time=0.057..10.556 rows=138 loops=1)
Filter: (y = ANY ('{1121,11411,11421,1161,1162}'::bpchar[]))
Total runtime: 5170.837 ms
As you can see, the estimated number of rows in the join is much lower
than the actual number. Normally, the join is part of a more complex
query which gets really slow, probably (among other reasons) because the
query planner uses the incorrect row estimate for its further plans.
Question is: What can I do to improve that estimation? There is a
foreign key from base.x to pt.x and both fields are declared not null.
Side question: What can I do to improve the speed of the hash join
itself? I understand that 2.5 secs are needed to perform the sequential
scan of table b, but where do the remaining 2.5 secs come from?
Thanks a lot in advance,
Christian
--
Deriva GmbH Tel.: +49 551 489500-42
Financial IT and Consulting Fax: +49 551 489500-91
Hans-Böckler-Straße 2 http://www.deriva.de
D-37079 Göttingen
Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general