On Sat, May 19, 2018 at 02:39:26PM -0400, Tom Lane wrote: > Justin Pryzby <pryzby@xxxxxxxxxxxxx> writes: > > [pryzbyj@database ~]$ while :; do for db in `psql postgres -Atc "SELECT datname FROM pg_database WHERE datallowconn"`; do for t in pg_statistic pg_attrdef pg_constraint; do echo "$db.$t..."; PGOPTIONS=-cstatement_timeout='9s' psql $db -qc "VACUUM FULL $t"; done; done; done > > > ... > > postgres.pg_statistic... > > postgres.pg_attrdef... > > postgres.pg_constraint... > > template1.pg_statistic... > > template1.pg_attrdef... > > template1.pg_constraint... > > ts.pg_statistic... > > ERROR: canceling statement due to statement timeout > > ts.pg_attrdef... > > ts.pg_constraint... > > postgres.pg_statistic... > > ERROR: missing chunk number 0 for toast value 3372855171 in pg_toast_2619 > > Hm, so was the timeout error happening every time through on that table, > or just occasionally, or did you provoke it somehow? I'm wondering how > your 9s timeout relates to the expected completion time. Actually statement_timeout isn't essential for this (maybe it helps to triggers it more often - not sure). Could you try: time sh -ec 'while :; do time psql postgres -c "VACUUM FULL VERBOSE pg_toast.pg_toast_2619"; psql postgres -c "VACUUM FULL VERBOSE pg_statistic"; done'; date Three servers experienced error within 30min, but one server didn't fail until 12h later, and a handful others still haven't failed.. Does this help at all ? 2018-05-24 21:57:49.98-03 | 5b075f8d.1ad1 | LOG | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619 2018-05-24 21:57:50.067-03 | 5b075f8d.1ad1 | INFO | pryzbyj | postgres | vacuuming "pg_toast.pg_toast_2619" 2018-05-24 21:57:50.09-03 | 5b075f8d.1ad1 | INFO | pryzbyj | postgres | "pg_toast_2619": found 0 removable, 408 nonremovable row versions in 99 pages 2018-05-24 21:57:50.12-03 | 5b075f8e.1ada | LOG | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_statistic 2018-05-24 21:57:50.129-03 | 5b075f8e.1ada | INFO | pryzbyj | postgres | vacuuming "pg_catalog.pg_statistic" 2018-05-24 21:57:50.185-03 | 5b075f8e.1ada | ERROR | pryzbyj | postgres | missing chunk number 0 for toast value 3382957233 in pg_toast_2619 Some thing; this server has autovacuum logging, although it's not clear to me if that's an essential component of the problem, either: 2018-05-24 21:16:39.856-06 | LOG | 5b078017.7b99 | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619 2018-05-24 21:16:39.876-06 | LOG | 5b078010.7968 | | | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1 + | | | | | pages: 0 removed, 117 r 2018-05-24 21:16:39.909-06 | INFO | 5b078017.7b99 | pryzbyj | postgres | vacuuming "pg_toast.pg_toast_2619" 2018-05-24 21:16:39.962-06 | INFO | 5b078017.7b99 | pryzbyj | postgres | "pg_toast_2619": found 0 removable, 492 nonremovable row versions in 117 pages 2018-05-24 21:16:40.025-06 | LOG | 5b078018.7b9b | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_statistic 2018-05-24 21:16:40.064-06 | INFO | 5b078018.7b9b | pryzbyj | postgres | vacuuming "pg_catalog.pg_statistic" 2018-05-24 21:16:40.145-06 | ERROR | 5b078018.7b9b | pryzbyj | postgres | missing chunk number 0 for toast value 765874692 in pg_toast_2619 Or this one? postgres=# SELECT log_time, database, user_name, session_id, left(message,999) FROM postgres_log WHERE (log_time>='2018-05-24 19:56' AND log_time<'2018-05-24 19:58') AND (database='postgres' OR database IS NULL OR user_name IS NULL OR user_name='pryzbyj') AND message NOT LIKE 'statement:%' ORDER BY 1; log_time | 2018-05-24 19:56:35.396-04 database | user_name | session_id | 5b075131.3ec0 left | skipping vacuum of "pg_toast_2619" --- lock not available ... log_time | 2019-05-24 19:57:35.78-04 database | user_name | session_id | 5b07516d.445e left | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1 : pages: 0 removed, 85 remain, 0 skipped due to pins, 0 skipped frozen : tuples: 1 removed, 348 remain, 0 are dead but not yet removable, oldest xmin: 63803106 : buffer usage: 179 hits, 4 misses, 87 dirtied : avg read rate: 1.450 MB/s, avg write rate: 31.531 MB/s : system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s log_time | 2018-05-24 19:57:35.879-04 database | postgres user_name | pryzbyj session_id | 5b07516f.447f left | missing chunk number 0 for toast value 624341680 in pg_toast_2619 log_time | 2018-05-24 19:57:44.332-04 database | user_name | session_id | 5af9fda3.70d5 left | checkpoint starting: time Justin