At 16:26 04/05/2007, you wrote:
Paul Smith wrote:
Why doesn't it use the other index? If use 'set enable_seqscan=0'
then it does.
Just a guess, but is the table clustered on column a? Maybe not
explicitly, but was it loaded from data that was sorted by a?
I wouldn't have thought so - a is pretty 'random' as far as order of
insertion goes. On the other hand 'b' (the one whose index doesn't
get used) is probably pretty correlated - 'b' is the date when the
entry was added to the table, so they would be added in order of 'b'
(they also get deleted after a while, and I'm not sure how PGSQL
re-uses deleted rows that have been vacuumed)
Analyzer calculates the correlation between physical order and each
column. The planner will favor index scans instead of sorting when
the correlation is strong, and it thinks the data doesn't fit in
memory. Otherwise an explicitly sort will result in less I/O and be
therefore more favorable.
Ah, I see.
You can check the correlation stats with:
SELECT tablename, attname, correlation FROM pg_stats where tablename='x';
There I get
x | a | 0.977819
x | b | 0.78292
This is a bit odd, because I'd have thought they'd be more correlated
on 'b' than 'a'..
I tried using EXPLAIN ANALYZE to see how long it actually took:
- seq scan - 75 secs
- index scan - 13 secs
- seq scan - 77 secs
(I tried the seq scan version after the index scan as well to see
if disk caching was a factor, but it doesn't look like it)
That won't flush the heap pages from cache...
No, I know, but it would mean that if the pages were being loaded
into disk cache by the first scan which would make the second scan
quicker, it would probably make the third one quicker as well.
How much memory do you have and how large is the table?
The table is about 300MB. I have 2GB RAM on my PC (but most of it is
in use - the disk cache size is currently 600MB).
I suspect that the planner thinks it doesn't fit in memory, and
therefore favors the seqscan+sort plan which would require less random I/O,
but in reality it's in cache and the index scan is faster because it
doesn't need to sort. Have you set your effective_cache_size properly?
I haven't set that at all - it's the default..
If I set this to 51200 (I think that means 400MB) then it does use
the index scan method, so thanks for this bit of info.
Paul VPOP3 - Internet Email Server/Gateway
support@xxxxxxxxxx http://www.pscs.co.uk/