On 02/09/11 11:18, Mark Kirkwood wrote:
On 02/09/11 11:13, Tom Lane wrote:
I wrote:
Mark Kirkwood<mark.kirkwood@xxxxxxxxxxxxxxx> writes:
[ assorted examples showing that commit
7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]
...
So, not only are you correct that we should revert the changes to
eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
I've retested these examples with the patches I committed yesterday.
Six of the eight examples are estimated pretty nearly dead on, while the
other two are estimated about 50% too high (still a lot better than
before). AFAICT there's no easy way to improve those estimates further;
eqjoinsel_semi just plain hasn't got enough information to know how many
matches there will be.
Just noticed your two commits this morning and ran them through the
examples too - results look really good! Not only are the plain join
queries looking way better but that last semi join that was way off is
now being estimated pretty close. Should be interesting to see how
much this improves more complex queries!
While this is still fresh in your mind, a couple of additional anti join
queries are still managing to sneak past estimation:
EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=501666.88..851597.05 rows=1 width=0) (actual
time=29956.971..50933.702 rows=5914 loops=1)
Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560
width=4) (actual time=13.352..13765.749 rows=1999780 loops=1)
Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29345.238..29345.238 rows=4985269 loops=1)
Buckets: 4096 Batches: 256 Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..22731.316 rows=4985269 loops=1)
Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)
EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=501666.88..821806.96 rows=1 width=0) (actual
time=46497.231..49196.057 rows=566 loops=1)
Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921
width=4) (actual time=19.916..16250.224 rows=199616 loops=1)
Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29901.178..29901.178 rows=4985269 loops=1)
Buckets: 4096 Batches: 256 Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.008..23207.964 rows=4985269 loops=1)
Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance