On 09.07.2011 00:36, Anish Kejariwal wrote:
My guess as to what happened: -because the icecream parent table has zero records, the query optimizer chooses the incorrect execution plan -when I do select * from icecream, the optimizer now knows how many records are really in the icecream table, by knowing that the icecream table has partitions.
"select * from icecream" won't have any direct effect on the optimization of subsequent queries. What probably happened is that autoanalyze ran in the background while you ran that select, and analyzed some of the partitions. Simply waiting a while would've had the same effect.
Next, if I run vacuum analyze on the parent table, I again get a wrong/slow execution plan (this time it uses the hash join). Again, I think this is because the parent table itself has zero records. QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=191926.03..191962.95 rows=1136 width=41) (actual time=28967.567..28994.395 rows=11028 loops=1) -> Hash Join (cost=166424.79..191585.47 rows=11352 width=41) (actual time=28539.196..28917.830 rows=11028 loops=1) Hash Cond: (r.widget_id = rc.widget_id) -> Append (cost=4.28..1149.30 rows=11352 width=21) (actual time=0.054..54.068 rows=11028 loops=1) -> Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1 width=48) (actual time=0.013..0.013 rows=0 loops=1) Recheck Cond: (widgetset_id = 5) Filter: (widget_type_id = 4) -> Bitmap Index Scan on widget_widgetset_id_idx (cost=0.00..4.28 rows=4 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (widgetset_id = 5) -> Index Scan using widget_part_5_widget_widget_type_id_idx on widget_part_5 r (cost=0.00..1136.55 rows=11351 width=21) (actual time=0.035..22.419 rows=11028 loops=1) Index Cond: (widget_type_id = 4) Filter: (widgetset_id = 5) -> Hash (cost=92214.73..92214.73 rows=4041823 width=24) (actual time=28438.419..28438.419 rows=4041866 loops=1) Buckets: 524288 Batches: 2 Memory Usage: 118449kB -> Append (cost=0.00..92214.73 rows=4041823 width=24) (actual time=0.020..14896.908 rows=4041866 loops=1) -> Seq Scan on icecream rc (cost=0.00..23.00 rows=5 width=24) (actual time=0.002..0.002 rows=0 loops=1) Filter: (dataset_id = 281) -> Seq Scan on icecream_part_281 rc (cost=0.00..92191.73 rows=4041818 width=24) (actual time=0.012..5718.592 rows=4041866 loops=1) Filter: (dataset_id = 281) Total runtime: 29007.937 ms (20 rows)
The cost estimates in the above slow plan are pretty accurate, so I suspect the cost estimates for the fast plan are not, or the planner would choose that.
I could of course disable hash join and merge join to force postgres to use a nested loop, but my system is often joining these two tables, and I'd rather not have to set this in every single place. set enable_mergejoin=off; set enable_hashjoin=off; set enable_nestloop = on;
Can you do explain analyze with these settings? That might give us a clue on where it's going wrong.
Also, I suspect that when you load more data into icecream, the planner might start to pick the faster plan, because the seqscan on icecream will start to look more expensive compared to the index scan and nested loop join in the faster plan.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance