Re: 8.4 optimization regression?

Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> · Fri, 02 Sep 2011 11:54:23 +1200

On 02/09/11 11:18, Mark Kirkwood wrote:
On 02/09/11 11:13, Tom Lane wrote:
I wrote:
Mark Kirkwood<mark.kirkwood@xxxxxxxxxxxxxxx>  writes:
[ assorted examples showing that commit
7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]
...
So, not only are you correct that we should revert the changes to
eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
I've retested these examples with the patches I committed yesterday.
Six of the eight examples are estimated pretty nearly dead on, while the
other two are estimated about 50% too high (still a lot better than
before).  AFAICT there's no easy way to improve those estimates further;
eqjoinsel_semi just plain hasn't got enough information to know how many
matches there will be.

Just noticed your two commits this morning and ran them through the 
examples too - results look really good! Not only are the plain join 
queries looking way better but that last semi join that was way off is 
now being estimated pretty close. Should be interesting to see how 
much this improves more complex queries!

While this is still fresh in your mind, a couple of additional anti join 
queries are still managing to sneak past estimation:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND 
n.updated > '2011-01-01'::timestamp );
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=501666.88..851597.05 rows=1 width=0) (actual 
time=29956.971..50933.702 rows=5914 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
   ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560 
width=4) (actual time=13.352..13765.749 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
   ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual 
time=29345.238..29345.238 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
         ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510 
width=4) (actual time=0.010..22731.316 rows=4985269 loops=1)
               Filter: (updated > '2011-01-01 00:00:00'::timestamp 
without time zone)

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000 
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND 
n.updated > '2011-01-01'::timestamp );
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=501666.88..821806.96 rows=1 width=0) (actual 
time=46497.231..49196.057 rows=566 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
   ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921 
width=4) (actual time=19.916..16250.224 rows=199616 loops=1)
         Filter: (keywordid < 10000)
   ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual 
time=29901.178..29901.178 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
         ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510 
width=4) (actual time=0.008..23207.964 rows=4985269 loops=1)
               Filter: (updated > '2011-01-01 00:00:00'::timestamp 
without time zone)

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance