On 25 March 2015 at 19:07, Jeff Janes <jeff.janes@xxxxxxxxx> wrote: > Also, I doubt that that is the problem in the first place. If you collect a > sample of 30,000 (which the default target size of 100 does), and the > frequency of the second most common is really 0.00307333 at the time you > sampled it, you would expect to find it 92 times in the sample. The chances > against actually finding 1 instead of around 92 due to sampling error are > astronomical. It can be that the distribution of values is very volatile; we hope the increased stats target (from the default=100 to 1000 for this column) and frequent autovacuum and autoanalyze helps in keeping the estimates correct. It seems that it did find some other records (<> 'PRINTED), as is demonstrated in the stats where there was only one value in the MCV list: the frequency was 0.996567 and the fraction of nulls was 0, therefore leaving 0.03+ for other values. But because none of them were in the MCV and MCF list, they were all treated as equals. They are certainly not equal. I not know why some values were found (they are mentioned in the histogram_bounds), but are not part of the MCV list, as you say, the likeliness of only 1 item being found is very small. Does anyone know the criteria for a value to be included in the MCV list? > The problem seems to be rapidly changing stats, not too small of a target > size (unless your original target size was way below the current default > value, forgive me if you already reported that, I didn't see it anywhere). > Maybe it would work better if you built the partial index where status = > 'NOT_YET_PRINTED', instead of !='PRINTED'. Thanks, we did create a partial index on 'NOT_YET_PRINTED' today to help aiding these kind of queries. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance