On Wed, Mar 25, 2015 at 1:00 PM, Feike Steenbergen <feikesteenbergen@xxxxxxxxx> wrote:
On 25 March 2015 at 19:07, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
> Also, I doubt that that is the problem in the first place. If you collect a
> sample of 30,000 (which the default target size of 100 does), and the
> frequency of the second most common is really 0.00307333 at the time you
> sampled it, you would expect to find it 92 times in the sample. The chances
> against actually finding 1 instead of around 92 due to sampling error are
> astronomical.
It can be that the distribution of values is very volatile; we hope
the increased stats target (from the default=100 to 1000 for this
column) and frequent autovacuum and autoanalyze helps in keeping the
estimates correct.
It seems that it did find some other records (<> 'PRINTED), as is
demonstrated in the stats where there was only one value in the MCV
list: the frequency was 0.996567 and the fraction of nulls was 0,
therefore leaving 0.03+ for other values. But because none of them
were in the MCV and MCF list, they were all treated as equals. They
are certainly not equal.
Now that I look back at the first post you made, it certainly looks like the statistics target was set to 1 when that was analyzed, not to 100. But it doesn't look quite correct for that, either.
What version of PostgreSQL are running? 'select version();'
What do you get when to do "analyze verbose print_list"?
How can the avg_width be 4 when the vast majority of entries are 7 characters long?
Cheers,
Jeff