Re: Analyze and default_statistics_target

Heikki Linnakangas <hlinnakangas@xxxxxxxxxx> · Mon, 21 Jan 2013 17:47:09 +0200

On 21.01.2013 17:29, AJ Weber wrote:
I was under the impression that the default_statistics_target was a
percentage of rows to analyze. Maybe this is not the case?

Nope.

I ran an analyze during a "quiet point" last night and for a few of my
large tables, I didn't get what I consider a reasonable sampling of
rows. When running with "verbose" enabled, it appeared that a maximum of
240000 rows were being analyzed, including on tables exceeding 4-8mm
rows. My default_statistics_target = 80.

Shouldn't I be analyzing a larger percentage of these big tables?

Analyze only needs a fairly small random sample of the rows in the table 
to get a picture of what the data looks like. Compare with e.g opinion 
polls; you only need to sample a few thousand people to get a result 
with reasonable error bound.

That's for estimating the histogram. Estimating ndistinct is a different 
story, and it's well-known that the estimates of ndistinct are sometimes 
wildly wrong.

What is the unit-of-measure used for default_statistics_target?

It's the number of entries stored in the histogram and 
most-common-values list in pg_statistics.

See also http://www.postgresql.org/docs/devel/static/planner-stats.html:

"The amount of information stored in pg_statistic by ANALYZE, in 
particular the maximum number of entries in the most_common_vals and 
histogram_bounds arrays for each column, can be set on a 
column-by-column basis using the ALTER TABLE SET STATISTICS command, or 
globally by setting the default_statistics_target configuration 
variable. The default limit is presently 100 entries."

- Heikki

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance