As I've spent a considerable amount of time trying to sort this out, I'm
posting it for the benefit other users.
I've experienced persistent, ongoing issues with autovacuum in a mixed
read/write environment with midrange hardware (16 core Xeon, 128 GB RAM,
200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults in the
9.1 RPMs provided by Postgres. (yum.postgresql.org) The cause of this is
not yet determined. It may be related to the any or all of the
combination of:
A) extensive use of temp tables;
B) extensive use of multiple dblink() calls in a single query;
C) use of transactions, especially prepared transactions and
multiple savepoints;
D) concurrent use of pg_dump;
E) use of numerous databases on a single server, average about 50;
To offset this, we turned off autovacuum, and used an old script to
vacuum the tables in the middle of the night when nobody was looking.
Unfortunately, the vacuum script only vacuumed the "userland" tables and
tremendous amounts of disk space were being wasted, particularly in the
pg_attribute tables.
However, use of any of the statements "vacuum analyze", "vacuum full
analyze", "vacuum full verbose analyze" without mentioning specific
tables did not resolve the extra disk space used issue, disk usage still
remained at least 5x the expected amount in all cases. (in one case, use
of all of these open-ended vacuum queries did almost nothing)
Nor did running any variation of "vacuum analyze $table" in a loop thru
all tables (including the pg_* tables) completely resolve the issue,
either.
In order to completely clean things up, we ended up writing a script do
the following:
1) Determine the databases using excessive disk space, in descending
order of use with this query:
SELECT
d.datname as Name,
d.datistemplate::int AS datistemplate,
pg_catalog.pg_get_userbyid(d.datdba) as Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE -1
END as Size
FROM pg_catalog.pg_database d
order by
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END desc nulls first;
2) For each database from #1, get a list of tables to be cleaned up with
this query:
SELECT
nspname || '.' || relname AS "table",
pg_total_relation_size(C.oid) AS "size"
FROM
pg_class C
LEFT JOIN pg_namespace N ON
(N.oid = C.relnamespace) WHERE (true or (nspname NOT IN
('pg_catalog', 'information_schema')))
AND C.relkind = 'r'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size(C.oid) DESC;
3) For each of the tables from #2, run the commands
REINDEX TABLE $table;
VACUUM FULL ANALYZE $table;
The end result is a squeaky-clean database server with expected disk usage.
NOTES:
1) The above queries are derived from queries found to determine how
much disk space was used, even though the additional information
provided isn't actually used by the script.
2) It was sheer chance that I discovered the need to reindex prior to
vacuum in order to get the disk space back.
3) I'd like to get autovacuum to work. I've read suggestions to tweak
cost_delay and/or cost_limit. I haven't yet determined if the problem is
I/O based or lock/deadlock based. I'm guessing the problem is the
latter, though it's hard to tell because queries stack up quickly and
load average is sky high when autovacuum fails for us.
4) The aforementioned process is S-L-O-W. Think at least hours and
probably days depending on your databases, your server(s), and the load.
5) I don't yet know if the "full" option for the vacuum is necessary to
free up all space. I will experiment with this and post results if useful.
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general