Re: Index not being used in sorting of simple table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 16:26 04/05/2007, you wrote:
Paul Smith wrote:
Why doesn't it use the other index? If use 'set enable_seqscan=0' then it does.

Just a guess, but is the table clustered on column a? Maybe not explicitly, but was it loaded from data that was sorted by a?

I wouldn't have thought so - a is pretty 'random' as far as order of insertion goes. On the other hand 'b' (the one whose index doesn't get used) is probably pretty correlated - 'b' is the date when the entry was added to the table, so they would be added in order of 'b' (they also get deleted after a while, and I'm not sure how PGSQL re-uses deleted rows that have been vacuumed)

Analyzer calculates the correlation between physical order and each column. The planner will favor index scans instead of sorting when the correlation is strong, and it thinks the data doesn't fit in memory. Otherwise an explicitly sort will result in less I/O and be therefore more favorable.

Ah, I see.

You can check the correlation stats with:
SELECT tablename, attname, correlation FROM pg_stats where tablename='x';

There I get
 x   | a  |     0.977819
 x   | b  |     0.78292

This is a bit odd, because I'd have thought they'd be more correlated on 'b' than 'a'..

I tried using EXPLAIN ANALYZE to see how long it actually took:
- seq scan - 75 secs
- index scan - 13 secs
- seq scan - 77 secs

(I tried the seq scan version after the index scan as well to see if disk caching was a factor, but it doesn't look like it)

That won't flush the heap pages from cache...

No, I know, but it would mean that if the pages were being loaded into disk cache by the first scan which would make the second scan quicker, it would probably make the third one quicker as well.

How much memory do you have and how large is the table?

The table is about 300MB. I have 2GB RAM on my PC (but most of it is in use - the disk cache size is currently 600MB).

I suspect that the planner thinks it doesn't fit in memory, and therefore favors the seqscan+sort plan which would require less random I/O, but in reality it's in cache and the index scan is faster because it doesn't need to sort. Have you set your effective_cache_size properly?

I haven't set that at all - it's the default..

If I set this to 51200 (I think that means 400MB) then it does use the index scan method, so thanks for this bit of info.


Paul                            VPOP3 - Internet Email Server/Gateway
support@xxxxxxxxxx                      http://www.pscs.co.uk/




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux