Your errors look somewhat similar to a problem I reported last week (no replies thus far): http://www.postgresql.org/message-id/CAP=oouE5niXgAO_34Q+FGq=tS6EsMHXXRRoByov3zn9pU4249w@xxxxxxxxxxxxxx Except in my case no number of restarts helped. You didn't say, were you explicitly copying $PGDATA or using some other mechanism to migrate the data elsewhere? Also, which version of postgres are you using? On Fri, Jan 11, 2013 at 6:32 PM, Phil Monroe <phil@xxxxxxxxxxxxxx> wrote: > > Hi Everyone, > > So we had to failover and do a full base backup to get our slave database > back online and ran into a interesting scenario. After copying the data > directory, setting up the recovery.conf, and starting the slave database, > the database crashes while replaying xlogs. However, trying to start the > database again, the database is able to replay xlogs farther than it > initially got, but ultimately ended up failing out again. After starting the > DB a third time, PostgreSQL replays even further and catches up to the > master to start streaming replication. Is this common and or acceptable? > > The error messages that cause PostgreSQL to stop replaying xlogs: > > ERROR #1: > 2013-01-12 00:17:17 UTC::@:[61150]: WARNING: page 2387476480 of relation > base/16408/18967399 does not exist > 2013-01-12 00:17:17 UTC::@:[61150]: CONTEXT: xlog redo delete: index > 1663/16408/22892842; iblk 658355, heap 1663/16408/18967399; > 2013-01-12 00:17:17 UTC::@:[61150]: PANIC: WAL contains references to > invalid pages > 2013-01-12 00:17:17 UTC::@:[61150]: CONTEXT: xlog redo delete: index > 1663/16408/22892842; iblk 658355, heap 1663/16408/18967399; > 2013-01-12 00:17:18 UTC::@:[61121]: LOG: startup process (PID 61150) was > terminated by signal 6: Aborted > > ERROR #2: > 2013-01-12 01:02:17 UTC::@:[36831]: WARNING: page 2387607552 of relation > base/16408/18967399 does not exist > 2013-01-12 01:02:17 UTC::@:[36831]: CONTEXT: xlog redo delete: index > 1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399; > 2013-01-12 01:02:17 UTC::@:[36831]: PANIC: WAL contains references to > invalid pages > 2013-01-12 01:02:17 UTC::@:[36831]: CONTEXT: xlog redo delete: index > 1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399; > 2013-01-12 01:02:20 UTC::@:[36796]: LOG: startup process (PID 36831) was > terminated by signal 6: Aborted > > > Fortunately, these errors only pertain to indexes, which can be rebuilt. > But is this a sign that the data directory on the slave is corrupt? > > Here is a full timeline of the logs highlighting the issues. : > > 1. Data Directory Copy Finishes. > 2. Recovery.conf Setup > > 3. Start #1: > > 2013-01-12 00:14:23 UTC::@:[61150]: LOG: database system was shut down in > recovery at 2013-01-12 00:14:06 UTC > 2013-01-12 00:14:23 UTC:[local]:[unknown]@[unknown]:[61151]: LOG: > incomplete startup packet > 2013-01-12 00:14:23 UTC:[local]:postgres@postgres:[61152]: FATAL: the > database system is starting up > unpigz: /mnt/db/wals/00000009.history does not exist -- skipping > 2013-01-12 00:14:23 UTC::@:[61150]: LOG: entering standby mode > 2013-01-12 00:14:23 UTC::@:[61150]: LOG: restored log file > "0000000900008E45000000B8" from archive > 2013-01-12 00:14:23 UTC::@:[61150]: LOG: restored log file > "0000000900008E450000008B" from archive > 2013-01-12 00:14:23 UTC::@:[61150]: LOG: could not open directory > "pg_snapshots": No such file or directory > 2013-01-12 00:14:23 UTC::@:[61150]: LOG: redo starts at 8E45/8B173180 > 2013-01-12 00:14:23 UTC::@:[61150]: LOG: consistent recovery state > reached at 8E45/8B174840 > 2013-01-12 00:14:23 UTC:[local]:postgres@postgres:[61175]: FATAL: the > database system is starting up > 2013-01-12 00:14:23 UTC::@:[61150]: LOG: restored log file > "0000000900008E450000008C" from archive > 2013-01-12 00:14:24 UTC::@:[61150]: LOG: restored log file > "0000000900008E450000008D" from archive > > *SNIP* > > 2013-01-12 00:17:17 UTC::@:[61150]: LOG: restored log file > "0000000900008E4800000066" from archive > 2013-01-12 00:17:17 UTC::@:[61150]: LOG: restored log file > "0000000900008E4800000067" from archive > 2013-01-12 00:17:17 UTC::@:[61150]: WARNING: page 2387476480 of relation > base/16408/18967399 does not exist > 2013-01-12 00:17:17 UTC::@:[61150]: CONTEXT: xlog redo delete: index > 1663/16408/22892842; iblk 658355, heap 1663/16408/18967399; > 2013-01-12 00:17:17 UTC::@:[61150]: PANIC: WAL contains references to > invalid pages > 2013-01-12 00:17:17 UTC::@:[61150]: CONTEXT: xlog redo delete: index > 1663/16408/22892842; iblk 658355, heap 1663/16408/18967399; > 2013-01-12 00:17:18 UTC::@:[61121]: LOG: startup process (PID 61150) was > terminated by signal 6: Aborted > 2013-01-12 00:17:18 UTC::@:[61121]: LOG: terminating any other active > server processes > > > 4. PostgreSQL shuts down... > 5. Debugging logs enabled in postgresql.conf. > > > 6. Start #2: > > 2013-01-12 00:51:48 UTC::@:[36831]: LOG: database system was interrupted > while in recovery at log time 2013-01-11 18:05:31 UTC > 2013-01-12 00:51:48 UTC::@:[36831]: HINT: If this has occurred more than > once some data might be corrupted and you might need to choose an earlier > recovery target. > 2013-01-12 00:51:48 UTC:[local]:[unknown]@[unknown]:[36832]: LOG: > incomplete startup packet > 2013-01-12 00:51:48 UTC:[local]:postgres@postgres:[36833]: FATAL: the > database system is starting up > unpigz: /mnt/db/wals/00000009.history does not exist -- skipping > 2013-01-12 00:51:48 UTC::@:[36831]: LOG: entering standby mode > 2013-01-12 00:51:48 UTC::@:[36831]: LOG: restored log file > "0000000900008E45000000B8" from archive > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: checkpoint record is at > 8E45/B80AF650 > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: redo record is at > 8E45/8B173180; shutdown FALSE > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: next transaction ID: > 0/552803703; next OID: 24427698 > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: next MultiXactId: 109027; next > MultiXactOffset: 2442921 > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: oldest unfrozen transaction > ID: 3104202601, in database 16408 > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: transaction ID wrap limit is > 956718952, limited by database with OID 16408 > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: resetting unlogged relations: > cleanup 1 init 0 > 2013-01-12 00:51:48 UTC::@:[36831]: LOG: could not open directory > "pg_snapshots": No such file or directory > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: initializing for hot standby > 2013-01-12 00:51:48 UTC::@:[36831]: LOG: restored log file > "0000000900008E450000008B" from archive > 2013-01-12 00:51:48 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:51:48 UTC::@:[36831]: LOG: redo starts at 8E45/8B173180 > 2013-01-12 00:51:49 UTC::@:[36831]: LOG: restored log file > "0000000900008E450000008C" from archive > > *SNIP* > > 2013-01-12 00:54:45 UTC::@:[36831]: LOG: restored log file > "0000000900008E4800000062" from archive > 2013-01-12 00:54:45 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:45 UTC::@:[36831]: LOG: restored log file > "0000000900008E4800000063" from archive > 2013-01-12 00:54:45 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:45 UTC::@:[36831]: LOG: restored log file > "0000000900008E4800000064" from archive > 2013-01-12 00:54:45 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:46 UTC::@:[36831]: LOG: restored log file > "0000000900008E4800000065" from archive > 2013-01-12 00:54:46 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:46 UTC::@:[36831]: LOG: restored log file > "0000000900008E4800000066" from archive > 2013-01-12 00:54:46 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:46 UTC::@:[36831]: LOG: restored log file > "0000000900008E4800000067" from archive > 2013-01-12 00:54:46 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:46 UTC::@:[36831]: LOG: consistent recovery state > reached at 8E48/67AC4E28 > 2013-01-12 00:54:46 UTC::@:[36796]: LOG: database system is ready to > accept read only connections > 2013-01-12 00:54:46 UTC::@:[36831]: LOG: restored log file > "0000000900008E4800000068" from archive > 2013-01-12 00:54:46 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:47 UTC::@:[36831]: LOG: restored log file > "0000000900008E4800000069" from archive > 2013-01-12 00:54:47 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:47 UTC::@:[36831]: LOG: restored log file > "0000000900008E480000006A" from archive > 2013-01-12 00:54:47 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:47 UTC::@:[36831]: LOG: restored log file > "0000000900008E480000006B" from archive > 2013-01-12 00:54:47 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 00:54:47 UTC::@:[36831]: LOG: restored log file > "0000000900008E480000006C" from archive > > *SNIP* > > 2013-01-12 01:02:16 UTC::@:[36831]: LOG: restored log file > "0000000900008E4F00000079" from archive > 2013-01-12 01:02:16 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 01:02:17 UTC::@:[36831]: LOG: restored log file > "0000000900008E4F0000007A" from archive > 2013-01-12 01:02:17 UTC::@:[36831]: DEBUG: got WAL segment from archive > 2013-01-12 01:02:17 UTC::@:[36831]: WARNING: page 2387607552 of relation > base/16408/18967399 does not exist > 2013-01-12 01:02:17 UTC::@:[36831]: CONTEXT: xlog redo delete: index > 1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399; > 2013-01-12 01:02:17 UTC::@:[36831]: PANIC: WAL contains references to > invalid pages > 2013-01-12 01:02:17 UTC::@:[36831]: CONTEXT: xlog redo delete: index > 1663/16408/22892841; iblk 1075350, heap 1663/16408/18967399; > 2013-01-12 01:02:20 UTC::@:[36796]: LOG: startup process (PID 36831) was > terminated by signal 6: Aborted > 2013-01-12 01:02:20 UTC::@:[36796]: LOG: terminating any other active > server processes > > > 7. PostgreSQL shuts down... > > > 8. Start #3: > > 2013-01-12 01:04:04 UTC::@:[61742]: LOG: database system was interrupted > while in recovery at log time 2013-01-11 19:50:31 UTC > 2013-01-12 01:04:04 UTC::@:[61742]: HINT: If this has occurred more than > once some data might be corrupted and you might need to choose an earlier > recovery target. > 2013-01-12 01:04:04 UTC:[local]:[unknown]@[unknown]:[61743]: LOG: > incomplete startup packet > 2013-01-12 01:04:04 UTC:[local]:postgres@postgres:[61744]: FATAL: the > database system is starting up > unpigz: /mnt/db/wals/00000009.history does not exist -- skipping > 2013-01-12 01:04:04 UTC::@:[61742]: LOG: entering standby mode > 2013-01-12 01:04:04 UTC::@:[61742]: LOG: restored log file > "0000000900008E4A00000039" from archive > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: checkpoint record is at > 8E4A/39CD4BA0 > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: redo record is at > 8E4A/19F0D210; shutdown FALSE > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: next transaction ID: > 0/552859005; next OID: 24427698 > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: next MultiXactId: 109221; next > MultiXactOffset: 2443321 > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: oldest unfrozen transaction > ID: 3104202601, in database 16408 > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: transaction ID wrap limit is > 956718952, limited by database with OID 16408 > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: resetting unlogged relations: > cleanup 1 init 0 > 2013-01-12 01:04:04 UTC::@:[61742]: LOG: could not open directory > "pg_snapshots": No such file or directory > 2013-01-12 01:04:04 UTC::@:[61742]: DEBUG: initializing for hot standby > 2013-01-12 01:04:05 UTC::@:[61742]: LOG: restored log file > "0000000900008E4A00000019" from archive > 2013-01-12 01:04:05 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:04:05 UTC::@:[61742]: LOG: redo starts at 8E4A/19F0D210 > 2013-01-12 01:04:05 UTC::@:[61742]: LOG: restored log file > "0000000900008E4A0000001A" from archive > 2013-01-12 01:04:05 UTC::@:[61742]: DEBUG: got WAL segment from archive > > *SNIP* > > 2013-01-12 01:09:37 UTC::@:[61742]: LOG: restored log file > "0000000900008E4F00000077" from archive > 2013-01-12 01:09:37 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:09:37 UTC::@:[61742]: LOG: restored log file > "0000000900008E4F00000078" from archive > 2013-01-12 01:09:37 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:09:37 UTC::@:[61742]: LOG: restored log file > "0000000900008E4F00000079" from archive > 2013-01-12 01:09:37 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:09:38 UTC::@:[61742]: LOG: restored log file > "0000000900008E4F0000007A" from archive > 2013-01-12 01:09:38 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:09:38 UTC::@:[61742]: LOG: consistent recovery state > reached at 8E4F/7A22BD08 > 2013-01-12 01:09:38 UTC::@:[61729]: LOG: database system is ready to > accept read only connections > 2013-01-12 01:09:38 UTC::@:[61742]: LOG: restored log file > "0000000900008E4F0000007B" from archive > 2013-01-12 01:09:38 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:09:38 UTC::@:[61742]: LOG: restored log file > "0000000900008E4F0000007C" from archive > 2013-01-12 01:09:38 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:09:38 UTC::@:[61742]: LOG: restored log file > "0000000900008E4F0000007D" from archive > 2013-01-12 01:09:38 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:09:39 UTC::@:[61742]: LOG: restored log file > "0000000900008E4F0000007E" from archive > > > *SNIP* > > 2013-01-12 01:14:29 UTC::@:[61742]: LOG: restored log file > "0000000900008E53000000D9" from archive > 2013-01-12 01:14:29 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:14:29 UTC::@:[61742]: LOG: restored log file > "0000000900008E53000000DA" from archive > 2013-01-12 01:14:29 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:14:29 UTC::@:[61742]: LOG: restored log file > "0000000900008E53000000DB" from archive > 2013-01-12 01:14:29 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:14:30 UTC::@:[61742]: LOG: restored log file > "0000000900008E53000000DC" from archive > 2013-01-12 01:14:30 UTC::@:[61742]: DEBUG: got WAL segment from archive > 2013-01-12 01:14:30 UTC::@:[61742]: LOG: restored log file > "0000000900008E53000000DD" from archive > 2013-01-12 01:14:30 UTC::@:[61742]: DEBUG: got WAL segment from archive > unpigz: /mnt/db/wals/0000000900008E53000000DE does not exist -- skipping > 2013-01-12 01:14:30 UTC::@:[61742]: LOG: unexpected pageaddr 8E4A/1000000 > in log file 36435, segment 222, offset 0 > unpigz: /mnt/db/wals/0000000900008E53000000DE does not exist -- skipping > 2013-01-12 01:14:30 UTC::@:[19504]: LOG: streaming replication > successfully connected to primary > 2013-01-12 01:18:08 UTC::@:[61772]: DEBUG: checkpoint sync: number=1 > file=base/16408/22873432 time=2.538 msec > 2013-01-12 01:18:08 UTC::@:[61772]: DEBUG: checkpoint sync: number=2 > file=base/16408/18967506 time=12.054 msec > 2013-01-12 01:18:08 UTC::@:[61772]: DEBUG: checkpoint sync: number=3 > file=base/16408/18967506_fsm time=0.095 msec > 2013-01-12 01:18:08 UTC::@:[61772]: DEBUG: checkpoint sync: number=4 > file=base/16408/22873244 time=0.144 msec > 2013-01-12 01:18:08 UTC::@:[61772]: DEBUG: checkpoint sync: number=5 > file=base/16408/22892823 time=0.087 msec > > 9. Slave DB connected to streaming replication with Master DB and all > seems fine. > > > Any help would be appreciated. Thanks! -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin