> From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] > "George Pavlov" <gpavlov@xxxxxxxxxxxxxx> writes: > > I am curious what could make the PA query to ignore the > index. What are > > the specific stats that are being used to make this decision? > > you don't have the column's statistics target set high enough to > track all the interesting values --- or maybe just not high enough to > acquire sufficiently accurate frequency estimates for them. > Take a look at the pg_stats row for the column ... > > (The default statistics target is 10, which is widely considered too > low --- you might find 100 more suitable.) Well, it seems that it would be more beneficial for me to set it LOWER than the default 10. I get better performance if the stats are less accurate because then the optimizer seems more likely to choose the index! States that are in pg_stats.most_common_vals most often result in a Seq Scan, whereas ones that are not in it definitely get the Index Scan. For all states, even the largest ones (15% of the data), the Index Scan performs better. So, for example, with SET STATISTICS 10 my benhcmark query in a state like Indiana (2981 rows, ~3% of total) runs in 132ms. If I SET STATISTICS 100, Indiana gets on the most_common_vals list for the column and the query does a Seq Scan and its run time jumps to 977ms! If I go the other way and SET STATISTICS 1 (or 0) I can bring down the list to one entry (setting to 0 seems equivalent and still keeps the one most common entry!?) and I will get the Index scan for all states except for that one most common state. But, of course, I don't want to undermine the whole stats mechanism, I just want the system to use the index that is so helpful and brings runtimes down by a factor of 4-8! What am I missing here? George