Markus Bertheau wrote:
I'm getting a plan that uses a sequential scan on ext_feeder_item instead of
several index scans, which slows down the query significantly:
# explain analyze select fi.pub_date from ext_feeder_item fi where fi.feed_id
in (select id from ext_feeder_feed ff where ff.is_system) order by pub_date
desc;
Sort (cost=298545.70..299196.46 rows=260303 width=8) (actual
time=89299.623..89302.146 rows=807 loops=1)
Using LIMIT in the subquery I can see that starting with 50 values for the in
the planner starts to prefer the seq scan. Plan for 49:
Sort (cost=277689.24..277918.39 rows=91660 width=8) (actual
time=477.769..478.193 rows=137 loops=1)
Note that the rows estimate for the index scan is way off. Increasing
statistics target for ext_feeder_item.feed_id to 100 lets the planner favor the
index scan up to LIMIT 150 for the subquery.
Using enable_seqscan=false, I see that the index scan plan continues to
outperform the seqscan plan even with limit 1500 in the subquery (1196 values
actually returned from it):
Sort (cost=100925142.27..100925986.74 rows=337787 width=8) (actual
time=102.111..104.627 rows=807 loops=1)
Why does the planner choose that way and what can I do to make it choose the
better plan, preferably without specifying limit and a maybe unreasonably high
statistics target for ext_feeder_item.feed_id?
Although the index scans are fast enough, the cost estimate is much more.
This suggests you need to tweak your planner cost settings:
http://www.postgresql.org/docs/8.3/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
I'd probably start with reducing random_page_cost if you have a
reasonable disk system and making sure effective_cache_size is
accurately set.
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings