On 11/07/2012 12:42 PM, Tom Lane wrote:
So you've turned off autovacuum, and are carefully not vacuuming the
system catalogs. That's your problem all right. Is there a
particularly good reason why this script isn't a one-liner "VACUUM"?
Back in the 8.x days, we experienced "vacuum full analyze" occasionally
causing other processes to hang/timeout. In an attempt to minimize the
impact of the locking, we updated the script to vacuum one table at a
time, which seemed to work well throughout the 8.x series. I'd happily
accept that this conclusion may have simply have been wrong, but it
worked well enough that nobody complained and life was good. After
switching to 9.x, we read that the "full" vacuum was less useful and so
the script was changed to "vacuum analyze $table" rather than "vacuum
full analyze $table".
Are you sure that once-a-day vacuuming is sufficient, even if it was
covering the system catalogs? If you've managed to bloat pg_attribute
to 36GB, I suspect you've got enough churn (perhaps from temp tables)
that you really need the catalogs vacuumed more often.
The only thing that I could find in the docs even mentioning the idea of
vacuuming catalogs is this sentence:
(A manual VACUUM should fix the problem, as suggested by the hint; but
note that the VACUUM must be performed by a superuser, else it will fail
to process system catalogs and thus not be able to advance the
database's datfrozenxid.)
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
This does NOT clearly say that the end user could vacuum catalogs, let
alone that it's necessary or even a good idea. Otherwise, the only
mention is of tables, and there's no mention of the idea that tables are
anything but user space.
My advice is dump, reload, and *don't* turn off autovacuum.
... because it
occasionally causes transactions and queries to hang when an update
causes a vacuum mid-day, effectively taking us offline randomly.
I suspect this claim is based on ancient and no longer very relevant
experience.
We tried several times to turn on autovacuum with 9.1 and had problems
every time. If our use case is particularly special, I'd love to work
with you to get autovacuum to work in our situation too as it would make
life easier for us! But for the past few months, every time we've turned
it on, we've had our phones swamped with customers who are unable to use
our system while our application monitors scream bloody murder, at least
weekly.
From what we could tell (under extreme pressure to get it all working
again ASAP, mind you) it seemed that when doing a large update from
within a transaction, autovacuum would get triggered before the
transaction completed, causing the transaction to hang or at least slow
way down, causing timeouts to occur with load balancers, so customers
would then try again, compounding the ongoing problem. Pretty soon you
have not only I/O issues, but also locking issues and upset customers.
This issue may be compounded because we make fairly extensive use of
dblink and temp tables to aggregate data for our customers who have
multiple sites.
-Ben
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general