On Tue, Dec 13, 2011 at 1:57 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> writes: >> I've got a 5GB table with about 12 million rows. >> Recently, I had to select the distinct values from just one column. >> The planner chose an index scan. The query took almost an hour. >> When I forced index scan off, the query took 90 seconds (full table scan). > > Usually, we hear complaints about the opposite. Are you using > nondefault cost settings? Cost settings had not been changed until a few minutes ago when your response prompted me to try a few things. I ended up changing the random_page_cost to 16.0 (from 4.0), partly because the H/W raid I'm using is awful bad at random I/O. I'll experiment and keep tabs on performance to see if this has a negative effect on other aspects. >> The planner estimated 70,000 unique values when, in fact, there are 12 >> million (the value for this row is *almost* but not quite unique). >> What's more, despite bumping the statistics on that column up to 1000 >> and re-analyzing, the planner now thinks that there are 300,000 unique >> values. > > Accurate ndistinct estimates are hard, but that wouldn't have much of > anything to do with this particular choice, AFAICS. > >> How can I tell the planner that a given column is much more unique >> than, apparently, it thinks it is? > > 9.0 and up have ALTER TABLE ... ALTER COLUMN ... SET n_distinct. D'oh! I'm on 8.4.10+patches. This may provide the necessary push. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance