[piggybackin' on older (seeming very similar) thread...] On Fri, Mar 29, 2013, Quentin Hartman wrote: > Yesterday morning, one of my streaming replication slaves running 9.2.3 > crashed with the following in the log file: > > 2013-03-28 12:49:30 GMT WARNING: page 1441792 of relation base/63229/63370 > does not exist > 2013-03-28 12:49:30 GMT CONTEXT: xlog redo delete: index > 1663/63229/109956; iblk 303, heap 1663/63229/63370; > 2013-03-28 12:49:30 GMT PANIC: WAL contains references to invalid pages > 2013-03-28 12:49:30 GMT CONTEXT: xlog redo delete: index I've got a pair of 9.2.4 DBs here in a streaming + log shipping setup. Everything moving like clockwork until a couple of weeks ago, at which time my slave died with the 'WAL contains references to invalid pages' error. After some wild googlin' "research", I saw the index visibility map fix for 9.2.1. We did pg_upgrade in-between versions, but just to be sure I wasn't somehow carrying corrupt data across versions (?), I went ahead and VACUUMed everythng with the vacuum_freeze_table_age set to 0, and went on with my life, hoping I had removed whatever demons were running around in there. Slave died again this weekend: [2013-09-08 02:29:14.712 GMT] 55340: <> : WARNING: page 2424 of relation base/16411/2529375 is uninitialized [2013-09-08 02:29:14.712 GMT] 55340: <> : CONTEXT: xlog redo vacuum: rel 1663/16411/2529375; blk 26937, lastBlockVacuumed 0 [2013-09-08 02:29:14.712 GMT] 55340: <> : PANIC: WAL contains references to invalid pages [2013-09-08 02:29:14.712 GMT] 55340: <> : CONTEXT: xlog redo vacuum: rel 1663/16411/2529375; blk 26937, lastBlockVacuumed 0 [2013-09-08 02:29:14.761 GMT] 55338: <> : LOG: startup process (PID 55340) was terminated by signal 6: Abort trap [2013-09-08 02:29:14.761 GMT] 55338: <> : LOG: terminating any other active server processes Unlike Quentin's original message, simply restarting the slave didn't bring it back to life. I had to pg_start_backup/rsync again from the master, at which point: cp: /usr/local/pgsql/wal/00000003.history: No such file or directory [2013-09-09 03:58:29.235 GMT] 96710: <> : LOG: entering standby mode [2013-09-09 03:58:29.324 GMT] 96710: <> : LOG: restored log file "0000000300000E1E0000004B" from archive [...] [2013-09-09 03:59:05.151 GMT] 96710: <> : LOG: consistent recovery state reached at E1E/89BE3DD8 [2013-09-09 03:59:05.152 GMT] 96708: <> : LOG: database system is ready to accept read only connections cp: /usr/local/pgsql/wal/0000000300000E1E0000008A: No such file or directory [2013-09-09 03:59:05.154 GMT] 96710: <> : LOG: unexpected pageaddr E1D/FB000000 in log file 3614, segment 138, offset 0 cp: /usr/local/pgsql/wal/0000000300000E1E0000008A: No such file or directory [2013-09-09 03:59:05.209 GMT] 96786: <> : LOG: streaming replication successfully connected to primary Seemingly okay again for now, though I'm anticipating similar future failures. The original relation that it had a problem with is a child partitioned table's index: [pgsql ~]$ oid2name -d db01 -o 2529375 From database "db01": Filenode Table Name ------------------------------------ 2529375 syslog_p_2013_32_seq_key Partitioning on year/week_of_year, it definitely shouldn't have been receiving any writes at the time of the failure. Also interesting, both failures were on a Saturday, around the time a base backup is performed (though it failed afterwards in both cases, so this may be coincidence.) Is there any other information I can gather than can help shed light on what is going on? Any recommendations going from here? My master is fine during all of this, but I'm concerned I've got some hidden corruption that will bite me when I least expect it. pgsql=# select version(); version ------------------------------------------------------------------------------------------------------------- PostgreSQL 9.2.4 on amd64-portbld-freebsd9.0, compiled by cc (GCC) 4.2.1 20070831 patched [FreeBSD], 64-bit -- Mahlon E. Smith http://www.martini.nu/contact.html -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general