Re: overzealous sorting?

Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> · Tue, 27 Sep 2011 22:22:27 +1300

On 27/09/11 22:05, anthony.shipman@xxxxxxxxxxxxx wrote:

What I really want is to just read a sequence of records in timestamp order
between two timestamps. The number of records to be read may be in the
millions totalling more than 1GB of data so I'm trying to read them a slice
at a time but I can't get PG to do just this.

If I use offset and limit to grab a slice of the records from a large
timestamp range then PG will grab all of the records in the range, sort them
on disk and return just the slice I want. This is absurdly slow.

The query that I've shown is one of a sequence of queries with the timestamp
range progressing in steps of 1 hour through the timestamp range. All I want
PG to do is find the range in the index, find the matching records in the
table and return them. All of the planner's cleverness just seems to get in
the way.

It is not immediately clear that the planner is making the wrong choices 
here. Index scans are not always the best choice, it depends heavily on 
the correlation of the column concerned to the physical order of the 
table's heap file. I suspect the reason for the planner choosing the 
bitmap scan is that said correlation is low (consult pg_stats to see). 
Now if you think that the table's heap data is cached anyway, then this 
is not such an issue - but you have to tell the planner that by reducing 
random_page_cost (as advised previously). Give it a try and report back!

regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance