On Mon, Apr 7, 2008 at 1:42 PM, Alban Hertroys <dalroi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Have you tried this query with enable_seqscan=off? If my guess is right > (and the planners, in that case) it'd be even slower. set enable_seqscan=off; explain select distinct datestamp from vals; QUERY PLAN ---------------------------------------------------------------------------------------------- Unique (cost=115003047.47..115380004.83 rows=4263 width=4) -> Sort (cost=115003047.47..115191526.15 rows=75391472 width=4) Sort Key: datestamp -> Seq Scan on vals (cost=100000000.00..101531261.72 rows=75391472 width=4) It appears to be doing a sequential scan regardless of the set, as if it doesn't believe it can use the index for some reason > > Something that might help you, but I'm not sure whether it might hurt the > performance of other queries, is to cluster that table on val_datestamp_idx. > That way the records are already (mostly) sorted on disk in the order of the > datestamps, which seems to be the brunt of above query plan. That's a good thought. I'll give that a try this evening when the DB has some downtime and see what happens. > > There seems to be quite a bit of overlap in your index definitions. From my > experience this can confuse the planner. > > I suggest you combine them, but not knowing your data... Maybe rewriting > your UNIQUE constraint to (val_dur, datestamp, eid, sid) would be enough to > replace all those other indexes. > If not, it's probably better to have one index per column, so that the > planner is free to combine them as it sees fit. That'd result in a bitmap > index scan, btw. I can take a look at the other indices again, but those are all in place for specific other queries that generally involve some set of a=1, b=2, c=3, datestamp>5 type of where-clause and were created specifically in response to sequential scans showing up in other queries (and had the proper effect of fixing them!) > > I'm not a postgresql tuning expert (I know my way around though), other > people can explain you way better than I can. Bruce Momjian for example: > http://www.linuxjournal.com/article/4791 I'll take a look at that, thanks. > That calculation doesn't look familiar to me, I'm more used to: > select pg_size_pretty(pg_relation_size('...')); > > You can put the name of any relation in there, be it tables, indexes, etc. > > 11GB is pretty large for an index on an integer column, especially with > only 75M rows: that's 146 bytes/row in your index. Maybe your index got > bloated somehow? I think it should be about a tenth of that. pg_total_relation_size('..') gives the number of bytes for the table + all associated indices; pg_relation_size('..') gives for just the table. The difference between the two should be total bytes take up by the 5 total indices (11 total index cols), giving a back-of-the-envelope estimation of 1gb for the size of the datestamp index. I am fairly certain that I didn't give pg 1gb to fit the index in memory, so I'll try upping its total available memory tonight and see if that doesn't improve things. I appreciate the responses so far! I'm used to several minutes for some of the complex queries on this DB, but 12.5 minutes for a select distinct just seems wrong. :) -- - David T. Wilson Princeton Satellite Systems david.t.wilson@xxxxxxxxx