Re: How to tell ANALYZE to collect statistics from the whole table?

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Sun, 25 Jan 2015 01:49:23 +0100

Hi,

On 25.1.2015 00:33, AlexK987 wrote:
> The documentation states that "The extent of analysis can be 
> controlled by adjusting the default_statistics_target configuration 
> variable". It looks like I can tell Postgres to create more 
> histograms with more bins, and more distinct values. This implicitly
> means that Postgres will use a larger random subset to calculate
> statistics.
> 
> However, this is not what I want. My data may be quite skewed, and I 
> want full control over the size of the sample. I want to explicitly 
> tell Postgres to analyze the whole table. How can I accomplish that?

I don't think there's an official way to do that - at least I can't
think of one. The only thing you can do is increasing statistics target
(either globally by setting default_statistics_target, or per column
using ALTER TABLE ... SET STATISTICS).

As you noticed, this however controls two things - sample size and how
detailed the statistics (MCV list / histogram) will be. The statistics
target is used as upper bound for number of MCV items / histogram bins,
and the number of sampled rows is (300 * statistics_target). With
default_statistics_target = 10000 (which si the max allowed value since
9.0), this produces very detailed stats and uses sample of ~3M rows.

It's a bit more complicated though, because there's an algorithm that
decides how many MCV items / histogram buckets to actually create, based
on the data. So you may not get more detailed stats, even when using
larger sample.

That being said, I really doubt increasing the statistics target above
10000 (or even sampling the whole table) will help you in practice.
Might be worth showing an example of a bad estimate with your data, or
maybe a test case to play with.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance