Hello,
We recently upgraded a 17 TB database from Postgres 9.6 to 12 using pg_upgrade. After this upgrade, we started observing that autovacuum would get in a loop about every 5 seconds for certain tables. This usually happened to be the toast table of the relation. This causes the performance of the table to decrease substantially. A manual VACUUM of the table resolves the issue. Here is an example of what we see in the log:
2020-11-04 16:34:38.131 UTC [892980-1] ERROR: canceling autovacuum task
2020-11-04 16:34:38.131 UTC [892980-2] CONTEXT: automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:41.878 UTC [893355-1] ERROR: canceling autovacuum task
2020-11-04 16:34:41.878 UTC [893355-2] CONTEXT: automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:45.208 UTC [893972-1] ERROR: canceling autovacuum task
2020-11-04 16:34:45.208 UTC [893972-2] CONTEXT: automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:47.635 UTC [894681-1] ERROR: canceling autovacuum task
2020-11-04 16:34:47.635 UTC [894681-2] CONTEXT: automatic vacuum of table "x.pg_toast.pg_toast_981540"
Based upon Googling, we suspect it is the truncation step of autovacuum and its ACCESS EXCLUSIVE lock attempt(s).
2020-11-04 16:34:38.131 UTC [892980-1] ERROR: canceling autovacuum task
2020-11-04 16:34:38.131 UTC [892980-2] CONTEXT: automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:41.878 UTC [893355-1] ERROR: canceling autovacuum task
2020-11-04 16:34:41.878 UTC [893355-2] CONTEXT: automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:45.208 UTC [893972-1] ERROR: canceling autovacuum task
2020-11-04 16:34:45.208 UTC [893972-2] CONTEXT: automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:47.635 UTC [894681-1] ERROR: canceling autovacuum task
2020-11-04 16:34:47.635 UTC [894681-2] CONTEXT: automatic vacuum of table "x.pg_toast.pg_toast_981540"
Based upon Googling, we suspect it is the truncation step of autovacuum and its ACCESS EXCLUSIVE lock attempt(s).
This behavior hits our pg_statistic toast table a lot. However, it is randomly happening to other tables infrequently.
I know that in PG 12 there is the new vacuum_truncate option that we can turn off on a table by table basis. However, that is a concern since this randomly happens to different tables.
I am seeking to understand what the best practice is in this case.
- Should we turn off vacuum_truncate for all tables or just turn off vacuum_truncate for the tables as they happen?
- Should we run a "VACUUM (TRUNCATE TRUE) table_name" manually for the tables where vacuum_truncate is turned off?
- Is there another database level solution I am not aware of?
- Lastly, does anyone know why this started happening after our upgrade from PG 9.6 to 12?
Many thanks,
Creston