Re: Postgres WAL Recovery Fails... And Then Works...

Heikki Linnakangas <hlinnakangas@xxxxxxxxxx> · Tue, 15 Jan 2013 12:51:50 +0200

On 12.01.2013 04:32, Phil Monroe wrote:
Hi Everyone,

So we had to failover and do a full base backup to get our slave database back
online and ran into a interesting scenario. After copying the data directory,
setting up the recovery.conf, and starting the slave database, the database
crashes while replaying xlogs. However, trying to start the database again, the
database is able to replay xlogs farther than it initially got, but ultimately
ended up failing out again. After starting the DB a third time, PostgreSQL
replays even further and catches up to the master to start streaming
replication. Is this common and or acceptable?

How did you perform the base backup? Did you use pg_basebackup? Or if 
you did a filesystem-level copy, did you use pg_start/stop_backup 
correctly? Did you take the base backup from the master server, or from 
another slave?

This looks similar to the bug discussed here: 
http://www.postgresql.org/message-id/CAMkU=1wpvYJVEDo6Qvq4QbosZ+AV6BMVCf+XVCG=mJqFRjQ8Pg@xxxxxxxxxxxxxx. 
That was fixed in 9.2.2, so if you're using 9.2.1 or 9.2.0, try upgrading.

- Heikki

--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin