Re: 8.4 optimization regression?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/09/11 11:18, Mark Kirkwood wrote:
On 02/09/11 11:13, Tom Lane wrote:
I wrote:
Mark Kirkwood<mark.kirkwood@xxxxxxxxxxxxxxx>  writes:
[ assorted examples showing that commit
7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]
...
So, not only are you correct that we should revert the changes to
eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
I've retested these examples with the patches I committed yesterday.
Six of the eight examples are estimated pretty nearly dead on, while the
other two are estimated about 50% too high (still a lot better than
before).  AFAICT there's no easy way to improve those estimates further;
eqjoinsel_semi just plain hasn't got enough information to know how many
matches there will be.



Just noticed your two commits this morning and ran them through the examples too - results look really good! Not only are the plain join queries looking way better but that last semi join that was way off is now being estimated pretty close. Should be interesting to see how much this improves more complex queries!



While this is still fresh in your mind, a couple of additional anti join queries are still managing to sneak past estimation:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp );
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=501666.88..851597.05 rows=1 width=0) (actual time=29956.971..50933.702 rows=5914 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=13.352..13765.749 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29345.238..29345.238 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.010..22731.316 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000 AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp );
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=501666.88..821806.96 rows=1 width=0) (actual time=46497.231..49196.057 rows=566 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=19.916..16250.224 rows=199616 loops=1)
         Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29901.178..29901.178 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.008..23207.964 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)







--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux