Re: 8.4 optimization regression?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24/08/11 15:15, Tom Lane wrote:
Mark Kirkwood<mark.kirkwood@xxxxxxxxxxxxxxx>  writes:
I am in the progress of an 8.3 to 8.4 upgrade for a customer. I seem to
have stumbled upon what looks like a regression. The two databases
(8.3.14 and 8.4.8) have identical tuning parameters (where that makes
sense) and run on identical hardware. Both databases are regularly
vacuumed and analyzed (not by autovacuum), and performing an ANALYZE
does not change the plans shown below.
Hmmm ... this is structurally a pretty simple query, so I'm surprised
that 8.3 and 8.4 see it very much differently.  The relation-level
estimates and plan choices are very nearly the same; the only thing
that's changed much is the estimates of the join sizes, and there were
not that many changes in the join selectivity estimation for simple
inner joins.  I wonder whether you are seeing a bad side-effect of this
patch:

http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30

That code would only be reached when one or both join columns lack MCV
lists in pg_stats; if you had analyzed, the only reason for that to be
the case is if the column is unique (or nearly so, in ANALYZE's opinion).

	

I've come up with (hopefully) a good set of semi, anti and regular joins to demonstrate the effect of this commit. I've attached them, and the schema generator (I believe I've used this before for optimization examples...).

Also I've tried out an experimental patch to make joins like the one I'm having trouble with *and* also the anti joins the commit was for - get better row estimates.

So firstly consider an anti join (these are run against git HEAD rather than 8.4):

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-12-01'::timestamp );

With commit:
                                                          QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=426079.34..765699.66 rows=1599293 width=0) (actual time=29907.716..47255.825 rows=1839193 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.373..11838.738 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29883.980..29883.980 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.339..29295.764 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)


Without commit:
                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=426079.34..760501.96 rows=1 width=0) (actual time=30409.336..47919.613 rows=1839193 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.359..12081.372 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=30392.235..30392.235 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.384..29806.407 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)


Note the rows estimate for the anti join is hopelessly wrong, so clearly the commitdoes the job here (I think this models the test case for said commit)!

Now some joins:

EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid = nk.nodeid) WHERE n.updated > '2011-01-01'::timestamp AND nk.keywordid < 100000;

With commit:
                                                          QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=501666.88..871512.65 rows=1991560 width=0) (actual time=30032.836..53073.731 rows=1993866 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.327..14393.629 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30017.777..30017.777 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.005..23272.287 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)


Without commit:
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=501666.88..871510.70 rows=1991365 width=0) (actual time=30549.498..54852.399 rows=1993866 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.331..13760.417 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30534.464..30534.464 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.005..23696.167 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)


Another join:

EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid = nk.nodeid) WHERE n.updated > '2011-12-01'::timestamp AND nk.keywordid < 100000;

With commit:
                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=426079.34..764424.63 rows=392267 width=0) (actual time=29295.966..45578.876 rows=160587 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.452..12367.760 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29273.571..29273.571 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=10.899..28678.818 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)


Without commit:
                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=426079.34..762064.41 rows=156245 width=0) (actual time=29179.313..44605.243 rows=160587 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.486..11546.469 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29156.889..29156.889 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=10.915..28545.553 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)


So in the case where we filer out a large percentage of the rows the commit inflates the estimates...consider a more extreme example:


EXPLAIN ANALYZE SELECT 1 FROM NODE n JOIN nodekeyword nk ON (n.nodeid = nk.nodeid) WHERE n.updated > '2011-12-27'::timestamp AND nk.keywordid < 10000;

With commit:
                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..569488.45 rows=16344 width=0) (actual time=55.452..65341.135 rows=604 loops=1) -> Seq Scan on node n (cost=0.00..419643.00 rows=16344 width=4) (actual time=13.537..46138.214 rows=14952 loops=1) Filter: (updated > '2011-12-27 00:00:00'::timestamp without time zone) -> Index Scan using nodekeyword_pk on nodekeyword nk (cost=0.00..9.16 rows=1 width=4) (actual time=1.277..1.279 rows=0 loops=14952)
         Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


Without commit:
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..569488.45 rows=631 width=0) (actual time=43.969..64988.036 rows=604 loops=1) -> Seq Scan on node n (cost=0.00..419643.00 rows=16344 width=4) (actual time=2.060..46065.879 rows=14952 loops=1) Filter: (updated > '2011-12-27 00:00:00'::timestamp without time zone) -> Index Scan using nodekeyword_pk on nodekeyword nk (cost=0.00..9.16 rows=1 width=4) (actual time=1.259..1.260 rows=0 loops=14952)
         Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


So clearly this commit is not so good for this type of join (this models the case I posted initially).

Now four semi joins:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-12-01'::timestamp );

With commit:
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=426079.34..753629.40 rows=392267 width=0) (actual time=28405.965..43724.471 rows=160587 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.767..11561.340 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=28391.293..28391.293 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.038..27820.097 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)

Without commit:
                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=426079.34..780417.56 rows=1991560 width=0) (actual time=29447.638..44738.280 rows=160587 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.771..11501.350 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29433.952..29433.952 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.040..28850.800 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)

Clearly row estimation is hopelessly broken *without* this commit here.

Another semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp );

With commit:
                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=501666.88..871512.65 rows=1991560 width=0) (actual time=29048.154..51230.453 rows=1993866 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.423..13430.618 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29024.442..29024.442 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.010..22384.904 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)


Without commit:
                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=501666.88..871512.65 rows=1991560 width=0) (actual time=28914.970..51162.918 rows=1993866 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.504..13780.506 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=28891.705..28891.705 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.008..22082.459 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)


Another semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000 AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp );

With commit:
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=501666.88..823736.17 rows=192921 width=0) (actual time=30120.347..49646.175 rows=199050 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.359..16335.889 rows=199616 loops=1)
         Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30072.444..30072.444 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.009..23409.799 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)

Without commit:
                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=501666.88..823736.17 rows=192921 width=0) (actual time=29395.513..48857.600 rows=199050 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.528..16261.983 rows=199616 loops=1)
         Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29348.826..29348.826 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.009..22505.930 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)


Final semi join:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000 AND EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-12-01'::timestamp );

With commit:
                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=426079.34..730392.78 rows=192921 width=0) (actual time=29060.665..44713.615 rows=16003 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.366..15064.457 rows=199616 loops=1)
         Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29026.017..29026.017 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.039..28441.039 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)

Without commit:
                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=426079.34..730392.78 rows=192921 width=0) (actual time=28969.107..43725.339 rows=16003 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.486..14198.613 rows=199616 loops=1)
         Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=28935.248..28935.248 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.047..28343.005 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)

Well this guy is too sneaky for either case :-(

We seem to need a patch variant that *only* clamps the estimates in the anti or semi join case, e.g (note against git HEAD):

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index e065826..bf5002f 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2257,11 +2257,6 @@ eqjoinsel_inner(Oid operator,
        double      nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;
        double      nullfrac2 = stats2 ? stats2->stanullfrac : 0.0;

-       if (vardata1->rel)
-           nd1 = Min(nd1, vardata1->rel->rows);
-       if (vardata2->rel)
-           nd2 = Min(nd2, vardata2->rel->rows);
-
        selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
        if (nd1 > nd2)
            selec /= nd1;


Now run all the queries, 1st the anti join:
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=426079.34..765699.66 rows=1599293 width=0) (actual time=30121.008..48503.453 rows=1839193 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.623..12853.522 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=30108.058..30108.058 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.347..29508.393 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)


And the  3 joins:

                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=501666.88..871510.70 rows=1991365 width=0) (actual time=30148.073..52370.308 rows=1993866 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.291..13300.233 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30124.453..30124.453 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.009..23334.774 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)


                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=426079.34..762064.41 rows=156245 width=0) (actual time=29954.251..46014.379 rows=160587 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.420..12126.142 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29936.578..29936.578 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=10.934..29357.789 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)


                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..569488.45 rows=631 width=0) (actual time=44.065..67179.686 rows=604 loops=1) -> Seq Scan on node n (cost=0.00..419643.00 rows=16344 width=4) (actual time=2.165..48523.075 rows=14952 loops=1) Filter: (updated > '2011-12-27 00:00:00'::timestamp without time zone) -> Index Scan using nodekeyword_pk on nodekeyword nk (cost=0.00..9.16 rows=1 width=4) (actual time=1.241..1.242 rows=0 loops=14952)
         Index Cond: ((nodeid = n.nodeid) AND (keywordid < 10000))


And the 4 semi joins...

                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=426079.34..753629.40 rows=392267 width=0) (actual time=29355.949..45220.958 rows=160587 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=5.731..11983.132 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=29342.387..29342.387 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.039..28763.514 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)

                                                           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=501666.88..871512.65 rows=1991560 width=0) (actual time=30823.334..53136.910 rows=1993866 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=12.555..13881.366 rows=1999780 loops=1)
         Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=30800.017..30800.017 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.010..24028.932 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)

                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=501666.88..823736.17 rows=192921 width=0) (actual time=29278.861..48346.647 rows=199050 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.523..15809.390 rows=199616 loops=1)
         Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29232.161..29232.161 rows=4985269 loops=1)
         Buckets: 4096  Batches: 256  Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.009..22541.899 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)

                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=426079.34..730392.78 rows=192921 width=0) (actual time=28594.210..42976.001 rows=16003 loops=1)
   Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=12.581..13810.924 rows=199616 loops=1)
         Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=392267 width=4) (actual time=28560.258..28560.258 rows=401678 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 891kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=392267 width=4) (actual time=0.048..27983.235 rows=401678 loops=1) Filter: (updated > '2011-12-01 00:00:00'::timestamp without time zone)

(this last one was wildly inaccurate pre patching)

So this looks quite encouraging (unless I have overlooked a set of queries that now perform worse - which could be the case), thoughts?

regards

Mark



Attachment: join-schema.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux