On Apr 16, 2010, at 9:48 AM, Tom Lane wrote: > Josh Kupershmidt <schmiddy@xxxxxxxxx> writes: >> On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >>> Wow. Well, we have a smoking gun here: for some reason, autovacuum >>> isn't running, or isn't doing its job if it is. If it's not running >>> at all, that would explain failure to prune the stats collector's file >>> too. > >> Hrm, well autovacuum is at least trying to do work: it's currently >> stuck on those bloated pg_catalog tables, of course. Another developer >> killed an autovacuum of pg_attribute (or maybe it was pg_attrdef) >> after it had been running for two weeks. See current pg_stat_activity >> output attached, which shows the three autovacuum workers running plus >> two manual VACUUM ANALYZEs I started yesterday. > > Two weeks? What have you got the autovacuum cost delays set to? > > Once you're up to three AV workers, no new ones can get launched until > one of those finishes or is killed. So that would explain failure to > prune the stats collector's tables (the tabpurge code is only run during > AV worker launch). So what we need to figure out is why it's taking so > obscenely long to vacuum these tables ... > On any large system with good I/O I have had to significantly increase the aggressiveness of autovacuum. Even with the below settings, it doesn't interfere with other activity (~2200iops random, ~900MB/sec sequential capable I/O). My relevant autovacuum parameters are (from 'show *'): autovacuum | on | Starts the autovacuum subprocess. autovacuum_analyze_scale_factor | 0.1 | Number of tuple inserts, updates or deletes prior to analyze as a fraction of reltuples. autovacuum_analyze_threshold | 50 | Minimum number of tuple inserts, updates or deletes prior to analyze. autovacuum_freeze_max_age | 200000000 | Age at which to autovacuum a table to prevent transaction ID wraparound. autovacuum_max_workers | 3 | Sets the maximum number of simultaneously running autovacuum worker processes. autovacuum_naptime | 1min | Time to sleep between autovacuum runs. autovacuum_vacuum_cost_delay | 20ms | Vacuum cost delay in milliseconds, for autovacuum. autovacuum_vacuum_cost_limit | 2000 | Vacuum cost amount available before napping, for autovacuum. autovacuum_vacuum_scale_factor | 0.2 | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples. autovacuum_vacuum_threshold | 50 For what it is worth, I just went onto one of my systems -- one with lots of partition tables and temp table creation/destruction -- and looked at the system tables in question there. Postgres 8.4, using dt+ (trimmed result below to interesting tables) Schema | Name | Type | Owner | Size | Description ------------+-------------------------+-------+----------+------------+------------- pg_catalog | pg_attrdef | table | postgres | 195 MB | pg_catalog | pg_attribute | table | postgres | 1447 MB | pg_catalog | pg_class | table | postgres | 1694 MB | pg_catalog | pg_constraint | table | postgres | 118 MB | pg_catalog | pg_depend | table | postgres | 195 MB | pg_catalog | pg_statistic | table | postgres | 2300 MB | pg_catalog | pg_type | table | postgres | 181 MB | So, I did a vacuum full; reindex table; analyze; sequence on each of these. I wish I could just CLUSTER them but the above works. now the tables are: Schema | Name | Type | Owner | Size | Description ------------+-------------------------+-------+----------+------------+------------- pg_catalog | pg_attrdef | table | postgres | 44 MB | pg_catalog | pg_attribute | table | postgres | 364 MB | pg_catalog | pg_class | table | postgres | 1694 MB | pg_catalog | pg_constraint | table | postgres | 118 MB | pg_catalog | pg_depend | table | postgres | 195 MB | pg_catalog | pg_statistic | table | postgres | 656 MB | pg_catalog | pg_type | table | postgres | 45 MB | I've learned to accept about 50% bloat (2x the compacted size) in postgres as just the way it usually is on a busy table, but the 3x and 4x bloat of statistic, attrdef, and attribute have me wondering. I have had some 'idle in transaction' connections hanging out from time to time that have caused issues on this machine that could explain the above perma-bloat. That is one thing that could affect the case reported here as well. The worst thing about those, is you can't even force kill those connections from within postgres (pg_cancel_backend doesn't work on them, and killing them via the OS bounces postgres ...) so you have to hunt down the offending client. > regards, tom lane > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance