Help tuning autovacuum - seeing lots of relation bloat

"jody brownell" <jody.brownell@xxxxxxxxxx> · Wed, 21 Jun 2006 10:52:42 -0300

Hey - I am running into a data relation bloat problem which I believe is causing fairly significant 
slowdown of my updates. I am using version

                                   version
-----------------------------------------------------------------------------
 PostgreSQL 8.1.4 on i586-trustix-linux-gnu, compiled by GCC gcc (GCC) 3.2.3

After about 12 hours of running, my updates are causing lots of reads and iowait (45%) slowing 
everything down. The DB bloats from 259MB to 2.4 - 3.4GB.

The primary table which is troubled is called target and reaches a size of in mb of 834MB from its 
freshly 'vacuum full analyze' size of 39MB.

qradar=# select * from q_table_size;
           tablename            |  size
--------------------------------+---------
 target                         | 834.496

My configuration includes.

shared_buffers = 32767
work_mem = 20480
maintenance_work_mem = 32768
max_fsm_pages = 4024000
max_fsm_relations = 2000
fsync = false
wal_sync_method = fsync
wal_buffers = 4096
checkpoint_segments = 32
checkpoint_timeout = 1200
checkpoint_warning = 60
commit_delay = 5000
commit_siblings = 5
effective_cache_size = 175000
random_page_cost = 2
autovacuum = true
autovacuum_naptime = 60
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
autovacuum_vacuum_scale_factor = 0.08
autovacuum_analyze_scale_factor = 0.08
#autovacuum_vacuum_cost_delay=100
#autovacuum_vacuum_cost_limit=100
default_statistics_target = 40

For the particular table I have pg_autovacuum overrides as 

app=# select * from pg_autovacuum where vacrelid = 16603;
 vacrelid | enabled | vac_base_thresh | vac_scale_factor | anl_base_thresh | anl_scale_factor | vac_cost_delay | vac_cost_limit
----------+---------+-----------------+------------------+-----------------+------------------+----------------+----------------
    16603 | t       |             200 |             0.01 |             200 |             0.01 |              0 |            400

What I am seeing is, after about 12 hours an update of a few thousand records takes about 2+ minutes as opposed the 100ms it used 
to take. I can restore performance only be stopping everything, perform a vacuum full analyze and restarting.

After the vacuum full, my table returns to the expected 250+ MB from the previous size.

qradar=# select * from q_table_size ;
           tablename            |  size
--------------------------------+---------
 target                         | 841.536

I can see autovacuum in top every 60 seconds as configured, but it is there and gone in the 1 second refresh. My table grows consistent 
every transaction to no avail. To stop the growth, I had to perform a manual vacuum analyze. But at this point, performance is so poor 
I have to perform vacuum analyze full.

Anyway, I am totally confused. My first cut at changing the autovacuum configuration was using Jim Nasby' advice by cutting all values in 
half leaving my tables at roughly 20% dead space, for this table, that would be just over 50k tuples. This however yields the same results 
as the above configuration with continous bloat. So, I was WAY more aggressive as shown above with no improvment. By calculation, Jims advice
would suffice for our system.

I just checked a production box which is running  8.1.1 and it is behaving as expected. This configuration only specifies "autovacuum = true", 
everything else is left to the defaults.

Is there something whacked about my configuration? Is there a way I can troubleshoot what autovacuum is doing or why it is not performing 
the work?

Here is the output for the vacuum full of target...

qradar=# vacuum full analyze verbose target;
INFO:  vacuuming "public.target"
INFO:  "target": found 5048468 removable, 266778 nonremovable row versions in 96642 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 140 to 144 bytes long.
There were 1696 unused item pointers.
Total free space (including removable row versions) is 730074628 bytes.
89347 pages are or will become empty, including 0 at the end of the table.
95261 pages containing 730030436 free bytes are potential move destinations.
CPU 2.31s/1.27u sec elapsed 6.46 sec.
INFO:  index "target_pkey" now contains 266778 row versions in 18991 pages
DETAIL:  5048468 index row versions were removed.
40 index pages have been deleted, 40 are currently reusable.
CPU 0.91s/5.29u sec elapsed 6.24 sec.
INFO:  index "target_network_key" now contains 266778 row versions in 15159 pages
DETAIL:  5048468 index row versions were removed.
30 index pages have been deleted, 30 are currently reusable.
CPU 0.45s/4.96u sec elapsed 5.43 sec.
INFO:  index "target_tulu_idx" now contains 266778 row versions in 19453 pages
DETAIL:  5048468 index row versions were removed.
17106 index pages have been deleted, 17106 are currently reusable.
CPU 0.79s/3.31u sec elapsed 4.10 sec.
INFO:  "target": moved 266719 row versions, truncated 96642 to 4851 pages
DETAIL:  CPU 5.19s/8.86u sec elapsed 14.27 sec.
INFO:  index "target_pkey" now contains 266778 row versions in 18991 pages
DETAIL:  266719 index row versions were removed.
41 index pages have been deleted, 41 are currently reusable.
CPU 0.78s/0.54u sec elapsed 1.32 sec.
INFO:  index "target_network_key" now contains 266778 row versions in 15159 pages
DETAIL:  266719 index row versions were removed.
31 index pages have been deleted, 31 are currently reusable.
CPU 0.49s/0.44u sec elapsed 0.93 sec.
INFO:  index "target_tulu_idx" now contains 266778 row versions in 19453 pages
DETAIL:  266719 index row versions were removed.
16726 index pages have been deleted, 16726 are currently reusable.
CPU 0.33s/0.38u sec elapsed 0.76 sec.
INFO:  analyzing "public.target"
INFO:  "target": scanned 4851 of 4851 pages, containing 266778 live rows and 0 dead rows; 12000 rows in sample, 266778 estimated total rows
VACUUM

A db wide vacuum full outputs this at the end.

INFO:  free space map contains 32848 pages in 159 relations
DETAIL:  A total of 24192 page slots are in use (including overhead).
24192 page slots are required to track all free space.
Current limits are:  4024000 page slots, 2000 relations, using 23705 KB.

So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what. 

Thanks again.
Jody