Paul:
Quite like Tom, I too think that its the first query that is more intriguing than the second one. (The expected cost for the indexscan (A) query is 4x the expected time for the 'Sequential Scan' (B) query !!)
Could you provide with the (complete output of) EXPLAIN ANALYZE times for both of these queries ? That would tell how much time it actually took as compared to the expected times.
Tom:
There is one thing though, that I couldn't really understand. Considering that A's correlation in pg_stats being very high compared to B, isn't it 'a better candidate' for a sequential scan as compared to B in this scenario ? Or is it the other way around ?
Regards,
Robins Tharakan
On 5/4/07, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Paul Smith <paullocal@xxxxxxxxxx> writes:
> If I do
> EXPLAIN SELECT * FROM x ORDER BY a;
> it says
> Index Scan using y on x (cost=0.00..2903824.15 rows=1508057 width=152)
> That's what I'd expect
> However, if I do
> EXPLAIN SELECT * FROM x ORDER BY b;
> it says
> Sort (cost=711557.34..715327.48 rows=1508057
> width=152)
> Sort Key:
> b
> -> Seq Scan on x (cost=0.00..53203.57 rows=1508057 width=152)
> Why doesn't it use the other index?
You have the question backwards: given those cost estimates, I'd wonder
why it doesn't do a sort in both cases. Offhand I think the sort cost
estimate is pretty much independent of the data itself, so it should
have come out with a cost near 715327 for sorting on A, so why's it
using an indexscan that costs more than 4x as much?
The indexscan cost estimate varies quite a bit depending on the
estimated correlation (physical ordering) of the column, so seeing
it do different things in the two cases isn't surprising in itself.
But I think there's some relevant factor you've left out of the example.
As for getting the estimates more in line with reality, you probably
need to play with random_page_cost and/or effective_cache_size.
> Any ideas? To me it looks like a bug in the planner. I can't think of
> any logical reason not to use an existing index to retrieve a sorted
> listing of the data.
Sorry, but using a forced sort frequently *is* faster than a full-table
indexscan. It all depends on how much locality of reference there is,
ie how well the index order and physical table order match up. The
planner's statistical correlation estimate and cost parameters may be
far enough off to make it pick the wrong choice, but it's not a bug that
it considers the options.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Robins