Moving this old thread to -hackers https://www.postgresql.org/message-id/flat/20180519142603.GA30060%40telsasoft.com I wanted to mention that this seems to still be an issue, now running pg11.5. log_time | 2019-08-30 23:20:00.118+10 user_name | postgres database | ts session_id | 5d69227e.235 session_line | 1 command_tag | CLUSTER session_start_time | 2019-08-30 23:19:58+10 error_severity | ERROR sql_state_code | XX000 message | unexpected chunk number 1 (expected 0) for toast value 2369261203 in pg_toast_2619 query | CLUSTER pg_statistic USING pg_statistic_relid_att_inh_index application_name | psql Note that my original report was for "missing" chunk during "VACUUM FULL", and the current error is "unexpected chunk" during CLUSTER. I imagine that's related issue. I haven't seen this in awhile (but stopped trying to reproduce it long ago). A recently-deployed update to this maintenance script is probably why it's now doing CLUSTER. On Fri, May 25, 2018 at 08:49:50AM -0500, Justin Pryzby wrote: > On Sat, May 19, 2018 at 02:39:26PM -0400, Tom Lane wrote: > > Justin Pryzby <pryzby@xxxxxxxxxxxxx> writes: > > > [pryzbyj@database ~]$ while :; do for db in `psql postgres -Atc "SELECT datname FROM pg_database WHERE datallowconn"`; do for t in pg_statistic pg_attrdef pg_constraint; do echo "$db.$t..."; PGOPTIONS=-cstatement_timeout='9s' psql $db -qc "VACUUM FULL $t"; done; done; done > > > > > ... > > > postgres.pg_statistic... > > > postgres.pg_attrdef... > > > postgres.pg_constraint... > > > template1.pg_statistic... > > > template1.pg_attrdef... > > > template1.pg_constraint... > > > ts.pg_statistic... > > > ERROR: canceling statement due to statement timeout > > > ts.pg_attrdef... > > > ts.pg_constraint... > > > postgres.pg_statistic... > > > ERROR: missing chunk number 0 for toast value 3372855171 in pg_toast_2619 > > > > Hm, so was the timeout error happening every time through on that table, > > or just occasionally, or did you provoke it somehow? I'm wondering how > > your 9s timeout relates to the expected completion time. > > Actually statement_timeout isn't essential for this (maybe it helps to triggers > it more often - not sure). > > Could you try: > time sh -ec 'while :; do time psql postgres -c "VACUUM FULL VERBOSE pg_toast.pg_toast_2619"; psql postgres -c "VACUUM FULL VERBOSE pg_statistic"; done'; date > > Three servers experienced error within 30min, but one server didn't fail until > 12h later, and a handful others still haven't failed.. > > Does this help at all ? > 2018-05-24 21:57:49.98-03 | 5b075f8d.1ad1 | LOG | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619 > 2018-05-24 21:57:50.067-03 | 5b075f8d.1ad1 | INFO | pryzbyj | postgres | vacuuming "pg_toast.pg_toast_2619" > 2018-05-24 21:57:50.09-03 | 5b075f8d.1ad1 | INFO | pryzbyj | postgres | "pg_toast_2619": found 0 removable, 408 nonremovable row versions in 99 pages > 2018-05-24 21:57:50.12-03 | 5b075f8e.1ada | LOG | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_statistic > 2018-05-24 21:57:50.129-03 | 5b075f8e.1ada | INFO | pryzbyj | postgres | vacuuming "pg_catalog.pg_statistic" > 2018-05-24 21:57:50.185-03 | 5b075f8e.1ada | ERROR | pryzbyj | postgres | missing chunk number 0 for toast value 3382957233 in pg_toast_2619 > > Some thing; this server has autovacuum logging, although it's not clear to me > if that's an essential component of the problem, either: > 2018-05-24 21:16:39.856-06 | LOG | 5b078017.7b99 | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619 > 2018-05-24 21:16:39.876-06 | LOG | 5b078010.7968 | | | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1 + > | | | | | pages: 0 removed, 117 r > 2018-05-24 21:16:39.909-06 | INFO | 5b078017.7b99 | pryzbyj | postgres | vacuuming "pg_toast.pg_toast_2619" > 2018-05-24 21:16:39.962-06 | INFO | 5b078017.7b99 | pryzbyj | postgres | "pg_toast_2619": found 0 removable, 492 nonremovable row versions in 117 pages > 2018-05-24 21:16:40.025-06 | LOG | 5b078018.7b9b | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_statistic > 2018-05-24 21:16:40.064-06 | INFO | 5b078018.7b9b | pryzbyj | postgres | vacuuming "pg_catalog.pg_statistic" > 2018-05-24 21:16:40.145-06 | ERROR | 5b078018.7b9b | pryzbyj | postgres | missing chunk number 0 for toast value 765874692 in pg_toast_2619 > > Or this one? > > postgres=# SELECT log_time, database, user_name, session_id, left(message,999) FROM postgres_log WHERE (log_time>='2018-05-24 19:56' AND log_time<'2018-05-24 19:58') AND (database='postgres' OR database IS NULL OR user_name IS NULL OR user_name='pryzbyj') AND message NOT LIKE 'statement:%' ORDER BY 1; > > log_time | 2018-05-24 19:56:35.396-04 > database | > user_name | > session_id | 5b075131.3ec0 > left | skipping vacuum of "pg_toast_2619" --- lock not available > > ... > > log_time | 2019-05-24 19:57:35.78-04 > database | > user_name | > session_id | 5b07516d.445e > left | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1 > : pages: 0 removed, 85 remain, 0 skipped due to pins, 0 skipped frozen > : tuples: 1 removed, 348 remain, 0 are dead but not yet removable, oldest xmin: 63803106 > : buffer usage: 179 hits, 4 misses, 87 dirtied > : avg read rate: 1.450 MB/s, avg write rate: 31.531 MB/s > : system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s > > log_time | 2018-05-24 19:57:35.879-04 > database | postgres > user_name | pryzbyj > session_id | 5b07516f.447f > left | missing chunk number 0 for toast value 624341680 in pg_toast_2619 > > log_time | 2018-05-24 19:57:44.332-04 > database | > user_name | > session_id | 5af9fda3.70d5 > left | checkpoint starting: time