Questions:
* What can I do to reduce the estimated row count on the query?
* Why does clustering drive down the estimated cost for the index scan
so much? Does a change in correlation from .72 to 1 make that much of
a difference?
* Can I convince my query planner to index scan without clustering on
the order_statuses_id index, or setting enable_seqscan = off?
After some more digging on the mailing list, I found some comments on
effective_cache_size. Bringing it up from the default of 1000 does pust
the estimated cost for the index scan below that of the sequential scan,
but not by much.
With SET effective_cache_size = 1000:
Seq Scan on orders o (cost=1.20..11395.53 rows=7029 width=8) (actual
time=280.148..281.512 rows=105 loops=1)
Filter: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4)
(actual time=0.012..0.020 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
Total runtime: 281.700 ms
With SET effective_cache_size = 10000:
Index Scan using orders_status_btree_idx on orders o
(cost=1.20..9710.91 rows=7029 width=8) (actual time=0.050..0.372
rows=105 loops=1)
Index Cond: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4)
(actual time=0.016..0.024 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
The ratios between estimated costs are still nowhere near the ratio of
actual costs.