Re: CREATE TABLE slowing down significantly over time

Greg Smith <greg@xxxxxxxxxxxxxxx> · Mon, 09 Nov 2009 12:04:21 -0500

Scott Marlowe wrote:
Also note that the argument that autovacuum chews up too much IO is
moot now that you can set cost delay to 10 to 20 milliseconds.  Unless
you're running on the hairy edge of maximum IO at all times, autovac
should be pretty much unnoticed
And if you're running on the hairy edge like that, you really need 
autovacuum whether you think you can afford it or not.  Badly maintained 
tables are also I/O intensive, and it's easy for someone who thinks "I'm 
too busy to allocate VACUUM time" to end up wasting more resources than 
it would have taken to just do things right in the first place.  I see 
way too many people who suffer from false economy when it comes to 
autovacuum planning.

Some comments on this whole discussion:

1) You don't end up with dead rows [auto]vacuum needs to clean up just 
when you delete things.  They show up when you UPDATE things, too--the 
original row isn't removed until after the new one is written.  The 
overhead isn't as bad on UPDATEs in 8.3 or later, but just because you 
don't delete doesn't mean you don't need VACUUM to clean up dead stuff.

2) Any time you find yourself considering VACUUM FULL to clean things 
up, you're probably making a mistake, because the sort of situations 
it's the only tool to recover from tend to be broader disasters.  The 
guidelines in 
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html 
spell out my feeling here as a tip:  "the best way is to use CLUSTER or 
one of the table-rewriting variants of ALTER TABLE".  If you're fighting 
performance issues because of some amount of background mismanagement 
with an unknown amount of table garbage in the past, it's quite possible 
you'll find the reinvigorated performance you get from CLUSTER worth the 
maintenance cost of needing an exclusive lock for it to run for a 
while.  See 
http://it.toolbox.com/blogs/database-soup/getting-rid-of-vacuum-full-feedback-needed-33959 
for more information.  I run CLUSTER all the time, and every time I 
think I'm saving time by doing VACUUM FULL/REINDEX instead I regret it 
as another false economy.  Turn on autovacuum, make it run all the time 
but at a slower average speed if you're concerned about its overhead, 
and use CLUSTER once to blow away the accumulated bloat from before you 
were doing the right things.

3) There is a lot of speculation here and no measurements.  What I'd be 
doing in this case is running something like the query at 
http://wiki.postgresql.org/wiki/Disk_Usage (but without the lines that 
filter out pg_catalog because the catalogs are a strong suspect here) 
regularly while debugging the problem here.  Measure how big all the 
catalog tables and indexes are, do your operations that make things 
better or worse, then measure again.  Turn autovacuum on, repeat the 
test, see if things are different.  This type of problem tends to be 
really easy to quantify.

--
Greg Smith    greg@xxxxxxxxxxxxxxx    Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance