Re: vacuumdb vs. max_connections: SELECT waiting

"Thomas F. O'Connell" <tfo@xxxxxxxxxxxx> · Sat, 13 May 2006 03:47:00 -0500

On May 13, 2006, at 12:35 AM, Tom Lane wrote:

VACUUM FULL does all right at packing the table (except in  
pathological
cases, eg a very large tuple near the end of the table).  It mostly
bites as far as shrinking indexes goes, however.  If you've got a
serious index bloat problem then REINDEX is the only solution.   
CLUSTER
does an automatic REINDEX after compacting the table --- it doesn't  
have
any special properties as far as the index space goes.  However, if
you've got serious table bloat then CLUSTER will probably be quicker
than VACUUM FULL.

So my reading of VACUUM VERBOSE output leads me to believe that I  
could actually interpret both table and index bloat for the entire  
database from it (and suggests a useful reporting tool...).

This is from the docs:

INFO:  index "onek_unique1" now contains 1000 tuples in 14 pages
DETAIL:  3000 index tuples were removed.
0 index pages have been deleted, 0 are currently reusable.

Does this imply that 75% of the tuples in this index were free space?  
Even so, since this is an index, the pages aren't shrunk, per your  
note above, and a REINDEX would be required for reclamation, right?

And, then, later:

INFO:  "onek": removed 3000 tuples in 108 pages
DETAIL:  CPU 0.01s/0.06u sec elapsed 0.07 sec.
INFO:  "onek": found 3000 removable, 1000 nonremovable tuples in 143  
pages
DETAIL:  0 dead tuples cannot be removed yet.

Which implies that this table was also 75% bloated? And a VACUUM FULL  
(or CLUSTER) could improve this even more than the VACUUM VERBOSE  
ANALYZE from the example in the docs?

What is the likely performance impact of having database (table or
index) bloat from several months in a > 100 GB database with tens of
thousands of relations of wildly varying sizes and insufficient FSM
settings? If autovacuum is keeping up with statistics for index
usage, is the only potential impact related to disk usage (in basic
filesystem terms)?

Unless you have a lot of seqscan-using queries, there's no particular
reason to panic over file bloat that I can see.  It's just a matter of
how close you are to running out of disk space ...

Unfortunately, there are several seqscan-using queries and several  
large tables (with the largest currently approaching 13% of physical  
memory).

I'm actually trying to prioritize administrative operations that  
could result in noticeable performance gains. If compacting tables  
and indexes turns out to be low on the pole in terms of performance  
considerations, then I'm inclined to look elsewhere, especially  
considering the administrative headache (from the perspective of the  
related application) required to undertake a database-wide CLUSTER or  
VACUUM FULL + REINDEX.

--
Thomas F. O'Connell
Database Architecture and Programming
Sitening, LLC

http://www.sitening.com/
3004 B Poston Avenue
Nashville, TN 37203-1314
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)