On 27/09/11 22:05, anthony.shipman@xxxxxxxxxxxxx wrote:
What I really want is to just read a sequence of records in timestamp order between two timestamps. The number of records to be read may be in the millions totalling more than 1GB of data so I'm trying to read them a slice at a time but I can't get PG to do just this. If I use offset and limit to grab a slice of the records from a large timestamp range then PG will grab all of the records in the range, sort them on disk and return just the slice I want. This is absurdly slow. The query that I've shown is one of a sequence of queries with the timestamp range progressing in steps of 1 hour through the timestamp range. All I want PG to do is find the range in the index, find the matching records in the table and return them. All of the planner's cleverness just seems to get in the way.
It is not immediately clear that the planner is making the wrong choices here. Index scans are not always the best choice, it depends heavily on the correlation of the column concerned to the physical order of the table's heap file. I suspect the reason for the planner choosing the bitmap scan is that said correlation is low (consult pg_stats to see). Now if you think that the table's heap data is cached anyway, then this is not such an issue - but you have to tell the planner that by reducing random_page_cost (as advised previously). Give it a try and report back!
regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance