On Tue, Oct 25, 2011 at 5:09 AM, Mohanaraj Gopala Krishnan <mohangk@xxxxxxxxx> wrote: > I have a query that runs a lot slower (~5 minutes) when I run it with > the default enable_nestloop=true and enable_nestloop=false (~10 secs). > The actual query is available here http://pastie.org/2754424 . It is a > reporting query with many joins as the database is mainly used for > transaction processing. > > Explain analyse result for both cases: > > Machine A nestloop=true - http://explain.depesz.com/s/nkj0 (~5 minutes) > Machine A nestloop=false - http://explain.depesz.com/s/wBM (~10 secs) A good start might be to refactor this: Seq Scan on retailer_categories retailer_category_leaf_nodes (cost=0.00..18.52 rows=1 width=16) (actual time=0.016..0.392 rows=194 loops=1) Filter: ((tree_right - tree_left) = 1) And this: Seq Scan on product_categories product_category_leaf_nodes (cost=0.00..148.22 rows=2 width=32) (actual time=0.031..1.109 rows=383 loops=1) Filter: ((tree_right - tree_left) = 1) The query planner really has no idea what selectivity to assign to that condition, and so it's basically guessing, and it's way off. You could probably improve the estimate a lot by adding a column that stores the values of tree_right - tree_left and is updated manually or by triggers as you insert and update data. Then you could just check tree_left_right_difference = 1, which should get a much more accurate estimate, and hopefully therefore a better plan. You've also got a fairly large estimation error here: Index Scan using invoices_create_date_idx on invoices (cost=0.00..8.28 rows=1 width=4) (actual time=0.055..0.305 rows=109 loops=1) Index Cond: ((create_date >= '2011-09-15'::date) AND (create_date <= '2011-09-15'::date)) Filter: (status = 128) Apparently, status 128 is considerably more common among rows in that date range than it is overall. Unfortunately, it's not so easy to fix this kind of estimation error, unless you can refactor your schema to avoid needing to filter on both create_date and status at the same time. It might be worth using temporary tables here - factor out sections of the query that are referenced multiple times, like the join between sales_order_items and invoices, and create a temporary table. ANALYZE it, and then use it to run the main query. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance