Re: planner favors seq scan too early

Richard Huxton <dev@xxxxxxxxxxxx> · Thu, 21 Feb 2008 08:34:09 +0000

Markus Bertheau wrote:

I'm getting a plan that uses a sequential scan on ext_feeder_item instead of
several index scans, which slows down the query significantly:

# explain analyze select fi.pub_date from ext_feeder_item fi where fi.feed_id
  in (select id from ext_feeder_feed ff where ff.is_system) order by pub_date
  desc;
 Sort  (cost=298545.70..299196.46 rows=260303 width=8) (actual
time=89299.623..89302.146 rows=807 loops=1)

Using LIMIT in the subquery I can see that starting with 50 values for the in
the planner starts to prefer the seq scan. Plan for 49:

 Sort  (cost=277689.24..277918.39 rows=91660 width=8) (actual
time=477.769..478.193 rows=137 loops=1)

Note that the rows estimate for the index scan is way off. Increasing
statistics target for ext_feeder_item.feed_id to 100 lets the planner favor the
index scan up to LIMIT 150 for the subquery.

Using enable_seqscan=false, I see that the index scan plan continues to
outperform the seqscan plan even with limit 1500 in the subquery (1196 values
actually returned from it):

 Sort  (cost=100925142.27..100925986.74 rows=337787 width=8) (actual
time=102.111..104.627 rows=807 loops=1)

Why does the planner choose that way and what can I do to make it choose the
better plan, preferably without specifying limit and a maybe unreasonably high
statistics target for ext_feeder_item.feed_id?

Although the index scans are fast enough, the cost estimate is much more.

This suggests you need to tweak your planner cost settings:
http://www.postgresql.org/docs/8.3/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

I'd probably start with reducing random_page_cost if you have a 
reasonable disk system and making sure effective_cache_size is 
accurately set.

--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings