On May 27, 2015, at 2:00 PM, Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote: > Charles Sprickman wrote: >> On May 27, 2015, at 1:37 PM, Jorge Torralba <jorge.torralba@xxxxxxxxx> wrote: >> >>> run this for validation >>> >>> select relname, last_autovacuum from pg_stat_user_tables order by last_autovacuum; >> >> Hi - thanks, I’d run some similar query before to see what wasn’t being vacuumed. >> >> While some tables seem to be now, here’s an example of an active db that’s still got some blanks in the last_autovacuum column: > > Does your system crash frequently? autovacuum would not process files > unless they appear to cross the threshold values; and the statistics are > reset whenever there's a crash recovery. No frequent crashes. Not any that I know of even since moving to this server (up 1046 days). So thats one thing to rule out. I also know that no one has manually reset any of the stats counters. > Perhaps those tables don't > have enough recent activity. I see your oldest autovac dates are > 2014-07 so perhaps you had crashes sometime before that. This is starting to make sense. On the same database, this data is kind of interesting: relname | n_tup_ins | n_tup_upd | n_tup_del | last_autovacuum | autovacuum_count ---------------------+-----------+-----------+-----------+-------------------------------+------------------ optin | 7252274 | 0 | 1975631 | 2014-07-21 06:31:47.66319-04 | 3 log | 43045279 | 0 | 31641637 | 2014-10-18 04:55:22.227628-04 | 8 complaint | 3002663 | 6154 | 3002973 | 2014-10-18 10:43:18.224247-04 | 31 arch_subscriber | 8472659 | 8184029 | 96798 | 2014-11-19 08:27:32.895688-05 | 2 subscriber | 13625052 | 28592779 | 8473154 | 2014-12-05 10:12:48.340305-05 | 14 transactions | 2514676 | 5182 | 257117 | 2015-02-18 12:47:42.406487-05 | 3 post | 32521 | 50722 | 1557 | 2015-05-23 01:46:47.039179-04 | 1 post_actions | 98395891 | 43836 | 44881522 | 2015-05-26 10:03:43.259077-04 | 3 bounce | 631024 | 81 | 595087 | 2015-05-26 10:29:34.888937-04 | 13 bounce_reason | 7961281 | 32736 | 7953428 | 2015-05-26 10:29:34.918845-04 | 6 list_sub | 14562995 | 69515258 | 9009496 | 2015-05-28 15:58:13.754569-04 | 30 post_job | 8127 | 15908 | 8070 | 2015-05-28 21:52:54.923614-04 | 263 list_data | 7350 | 756705 | 6063 | 2015-05-29 12:34:42.323142-04 | 425 post_stats | 90155 | 40750025 | 108 | 2015-05-29 13:14:43.307775-04 | 569 limit_tracking | 30248 | 12689580 | 30192 | 2015-05-29 13:54:42.06234-04 | 50780 email | 0 | 0 | 0 | | 0 sending_rule_fields | 0 | 0 | 0 | | 0 sending_rule_files | 0 | 0 | 0 | | 0 post_threads | 0 | 0 | 0 | | 0 forwards | 0 | 0 | 0 | | 0 list | 9 | 28 | 7 | | 0 sub_active | 840019 | 0 | 0 | | 0 subscriber_db_field | 0 | 0 | 0 | | 0 slice_fields | 0 | 0 | 0 | | 0 sending_rule_esps | 0 | 0 | 0 | | 0 client | 1 | 0 | 0 | | 0 slices | 0 | 0 | 0 | | 0 fields | 11 | 0 | 0 | | 0 arch_list_sub | 8909636 | 0 | 101140 | | 0 sending_rules | 0 | 0 | 0 | | 0 subscriber_db_query | 0 | 0 | 0 | | 0 engagement | 0 | 0 | 0 | | 0 reject_rule | 608 | 0 | 0 | | 0 list_criterion | 0 | 0 | 0 | | 0 repltest | 0 | 0 | 0 | | 0 post_attachment | 1 | 0 | 0 | | 0 filters | 0 | 0 | 0 | | 0 messages | 0 | 0 | 0 | | 0 post_domain | 262398 | 1177 | 290 | | 0 track_url | 786 | 1394 | 1 | | 0 subscriber_data | 0 | 0 | 0 | | 0 message_attachments | 0 | 0 | 0 | | 0 migrations | 11 | 0 | 0 | | 0 subscriptions | 8606373 | 463446 | 0 | | 0 mailer | 0 | 0 | 0 | | 0 subscriber_db_file | 0 | 0 | 0 | | 0 track_hit | 29733544 | 86038 | 6383080 | | 0 engagement_rules | 0 | 0 | 0 | | 0 (48 rows) Of all the tables that have NOT been autovacuumed, only a handful have any real write activity. And on those I cant really get any data as to WHEN the last activity was (unless I reset stats and wait) although looking a bit closer I see some timestamped columns and none of these tables have been touched in years. Good, I think. Should I be worried about limit_tracking which has been vacuumed about 50,000 times in the last few days? :) That seems a little extreme. Now if we assume the tables that werent vacuumed are not seeing any new activity, my next problem is how to satisfy my nagios check_postgres checks that look for tables that have not been vacuumed in X days. > How many workers are currently processing tables? (See > pg_stat_activity) The more workers there are, the slower they become. By workers, I assume you just mean pg processes? It varies, but anywhere between a few dozen and a hundred. > Perhaps they are all busy processing large tables and they never finish > because of the vacuum_delay. On the other hand, naptime=40min means > that from one worker start to the next one there's a 40min wait, which > seems a bit too long. Normally the default of 1min is appropriate; why > did you change that? I think because I dont fully grasp how all these variables interact. IIRC, when I set this it had something to do with compensating for the number of databases and the number of workers. Im looking at pgbadger’s analysis of vacuums and I think Im mostly OK with whats happening. Im still stumped on why changes from defaults were necessary to get autovacuum to kick in though Thanks, Charles > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin