Re: Index only scan sometimes switches to sequential scan for small amount of rows

Feike Steenbergen <feikesteenbergen@xxxxxxxxx> · Wed, 25 Mar 2015 21:00:44 +0100

On 25 March 2015 at 19:07, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:

> Also, I doubt that that is the problem in the first place.  If you collect a
> sample of 30,000 (which the default target size of 100 does), and the
> frequency of the second most common is really 0.00307333 at the time you
> sampled it, you would expect to find it 92 times in the sample. The chances
> against actually finding 1 instead of around 92 due to sampling error are
> astronomical.

It can be that the distribution of values is very volatile; we hope
the increased stats target (from the default=100 to 1000 for this
column) and frequent autovacuum and autoanalyze helps in keeping the
estimates correct.

It seems that it did find some other records (<> 'PRINTED), as is
demonstrated in the stats where there was only one value in the MCV
list: the frequency was 0.996567 and the fraction of nulls was 0,
therefore leaving 0.03+ for other values. But because none of them
were in the MCV and MCF list, they were all treated as equals. They
are certainly not equal.

I not know why some values were found (they are mentioned in the
histogram_bounds), but are not part of the MCV list, as you say, the
likeliness of only 1 item being found is very small.

Does anyone know the criteria for a value to be included in the MCV list?

> The problem seems to be rapidly changing stats, not too small of a target
> size (unless your original target size was way below the current default
> value, forgive me if you already reported that, I didn't see it anywhere).
> Maybe it would work better if you built the partial index where status =
> 'NOT_YET_PRINTED', instead of !='PRINTED'.

Thanks, we did create a partial index on 'NOT_YET_PRINTED' today to
help aiding these kind of queries.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance