Re: autovacuum strategy / parameters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> My guess is that the reason we run ANALYZE more frequently than vacuum
> (with the default settings) is that ANALYZE is pretty cheap.  In many
> cases, if the statistical distribution of the data hasn't changed
> much, then it's not really necessary, but it doesn't cost much either.
>  And for certain types of usage patterns, like time series (where the
> maximum value keeps increasing) it's REALLY important to analyze
> frequently.
> 
> But having said that, on the systems I've worked with, I've only
> rarely seen a problem caused by not analyzing frequently enough.  On
> the other hand, I've seen MANY problems caused by not vacuuming
> enough. 

Which is the opposite of my experience; currently we have several
clients who have issues which required more-frequent analyzes on
specific tables.   Before 8.4, vacuuming more frequently, especially on
large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
with 8.4 it's not something you want to increase without thinking about
the tradeoffs.

Since I'm responsible for the current defaults, I though I'd explain the
reasoning behind them.  I developed and tested them while at Greenplum,
so they are *not* designed for small databases.

#autovacuum_vacuum_threshold = 50
#autovacuum_analyze_threshold = 50

These two are set to the minimum threshold to avoid having small tables
get vacuum/analyzed continuously, but to make sure that small tables do
get vacuumed & analyzed sometimes.

#autovacuum_vacuum_scale_factor = 0.2

This is set because in my experience, 20% bloat is about the level at
which bloat starts affecting performance; thus, we want to vacuum at
that level but not sooner.  This does mean that very large tables which
never have more than 10% updates/deletes don't get vacuumed at all until
freeze_age; this is a *good thing*. VACUUM on large tables is expensive;
you don't *want* to vacuum a billion-row table which has only 100,000
updates.

#autovacuum_analyze_scale_factor = 0.1

The 10% threshold for analyze is there because (a) analyze is cheap, and
(b) 10% changes to a table can result in very bad plans if the changes
are highly skewed towards a specific range, such as additions onto the
end of a time-based table.

The current postgres defaults were tested on DBT2 as well as pgbench,
and in my last 2 years of consulting I've seldom found reason to touch
them except on *specific* tables.  So I still feel that they are good
defaults.

It would be worth doing a DBT2/DBT5 test run with different autovac
settings post-8.4 so see if we should specifically change the vacuum
threshold.  Pending that, though, I think the current defaults are good
enough.

-- 
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux