Re: Vaccuum best practice: cronjob or autovaccuum?

Bill Moran <wmoran@xxxxxxxxxxxxxxxxxxxxxxx> · Thu, 28 Aug 2008 12:47:59 -0400

In response to Joao Ferreira gmail <joao.miguel.c.ferreira@xxxxxxxxx>:
> 
> When autovacuum was introduced, I kept the weekly VACUUM FULL because it
> effectively brings disk occupation down, despite it grows back after a few
> hours. It's just re-assuring to me to make sure that at least one of the
> vacuums it's working when I see the weekly drop of disk occupation.

Note the "it grows back after a few hours" is unneeded load on your
system.  You just spent a lot of IO shrinking a file that PG is now going
to spend a lot of IO re-enlarging.  Thus you made PG work harder for a
while after the VACUUM FULL in order to get the table back to a stable
size.

> quoting:
> 
> http://www.postgresql.org/docs/8.3/static/routine-reindex.html
> 
> ---------------------------------------------
> The potential for bloat in non-B-tree indexes has not been well
> characterized.
> ----------------------------------------------------
> 
> -------------------------------------------------
> It is a good idea to keep an eye on the index's physical size when using
> any non-B-tree index type. 
> ----------------------------------------------------

Both special cases.  B-tree's are default.

> ----------------------------------------------------
> Also, for B-tree indexes a freshly-constructed index is somewhat faster
> to access than one that has been updated many times, because logically
> adjacent pages are usually also physically adjacent in a newly built
> index. (This consideration does not currently apply to non-B-tree
> indexes.) It might be worthwhile to reindex periodically just to improve
> access speed. 
> ----------------------------------------------------------
> 
> I did some tests (lets say about 50 INSERT/UPDATES per second and somw
> thousande DELETEs once in a while).
> 
> Query execution time DROPs 10 times after REINDEXING and VACUUM

This would be interesting to this conversation if you _only_ did the
REINDEX.  The addition of the VACUUM makes it ambiguous as to which
command actually made the improvement.

> I've seen 600Megas DROP in disk size ocupation just by reindexing;

Is that consistent, or was it an extraordinary case?  It's pretty amazing
to imagine 600M of index bloat on a 100M database ... how many indexes do
you have?  Are you sure you don't have duplicate indexes or some other
issue?

> additionally about 500 Megas drop by VACCUMING; this was on a 100Megas
> of usefull data.

It's probably because you're only vacuuming once a week.

> > That's a pretty lousy maintenance plan with lots of unneeded overhead,
> > unless you're database has a very edge-case access pattern.
> 
> !? Many people mention these 'edeg-case access pattern', and 'corner
> cases'. I don't now if mine is such. My appliucation executes patterns
> of varying INSERTING and UPDATING but it must overcome stress tests with
> about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs
> up to 10000 records once every night.
> 
> It's a 'near-realtime' log store which stores log information for 'you
> name it' logs in Linux based systems. It also removes old data once a
> night.

That is a fairly edge-case use that's actually documented in the docs you
reference to be likely to cause abnormal amounts of bloat.  However, it's
just one index on one table (unless you have a bunch of indexes on that
table that are all continually increasing?)  Unless there's something I'm
missing, it hardly justifies reindexing the entire database.

> > VACUUM FULL really only needs done in extreme cases where massive data
> > bloat is experienced, and not expected to happen again.  If massive data
> > bloat is routine, you're probably better off letting ordinary VACUUM
> > maintain a consistent level of free space on tables.
> 
> well. I've seen massive data and index bloating in my application. In
> some cases REINDEXING and VACUUM FULL ing was the only way to clear up
> things.

It's possible that the stable size for you tables has more free space than
actual data.  While that seems crazy, it's really no different than having
a file server with 4x as much disk space as people normally are using.

> > REINDEXING seems nice at times, but I've yet to see any evidence that
> > it's necessary.  There are probably some corner cases, but I've not seen
> > them documented. 
> 
> neither did I. REINDEXING helps me keep query execution time low.
> Othewise it will increase to levels I cannot have. We use 'modest'
> processors and 'modest' storage, in highly dependable embedded systems.

Oh ... well this might change a lot.  Depending on what you mean by
"embedded", you may be using experience that's _very_ specialized to
make generalizations from.

The RAM/CPU/storage/etc constraints on anything I would call "embedded"
are different than just about anyone else would be dealing with for
tuning a DB server.  If you're looking to tune PG to run well on an
embedded platform, my advice would be to throw out everything you've
heard and test extensively for yourself -- which sounds like what
you've done.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@xxxxxxxxxxxxxxxxxxxxxxx
Phone: 412-422-3463x4023