Re: WAL Log Shipping - Warm Standby not working under 8.3.7

Keaton Adams <kadams@xxxxxxxxxxx> · Fri, 8 Jan 2010 10:50:12 -0700

Title: Re:  WAL Log Shipping - Warm Standby not working under 8.3.7

OK,

So what am I doing wrong here?

Installed PG 8.3.7 on Slave machine

Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.

Shut down the PG instance on the slave machine

Ran a script that does the following:

select pg_start_backup('Master_Backup');

rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}

select pg_stop_backup();

ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null

ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null 

ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

When the slave PG database attempts to come up in recovery mode, it aborts because it is looking for a log file that is extremely old and does not exist on the master DB server.  I believe the Master PG instance was restarted on 12/28/09 and has been running ever since.  Is there a way to reset the “last completed transaction” on a DB?  Why is PG looking so far back for a WAL log to begin recovery when so much has been done since the 28th including daily backups?

<2010-01-07 10:54:23 MST>LOG:  received immediate shutdown request

/mxl/var/pgsql/data/ha_copy.sh: line 103: 13976 Quit                    sleep 5

File /mxl/var/pgsql/data/stopslave found. Aborting Process.

<2010-01-07 10:54:28 MST>LOG:  could not open file "pg_xlog/00000001000000F6000000E9" (log file 246, segment 233): N

o such file or directory

<2010-01-07 10:54:28 MST>LOG:  redo done at F6/E8FFE378

<2010-01-07 10:54:28 MST>LOG:  last completed transaction was at log time 2009-12-28 10:18:04.893307-07

Waiting for log: 00000001000000F6000000E8

<2010-01-07 11:24:49 MST>FATAL:  could not restore file "00000001000000F6000000E8" from archive: return code 15

Again, nothing was changed with the scripts or the replication process and this worked just fine under 8.1.4.

Thanks!

On 1/8/10 8:10 AM, "Keaton Adams" <kadams@xxxxxxxxxxx> wrote:

I did find some references to a fix of last-completed transaction time and I looked in the postgresql-bugs archive, but I’m not having any luck confirming that this is a problem in 8.3.7 and an upgrade to 8.3.9 would fix the issue.

postgresql 8.3.7 .... Fix incorrect logging of last-completed-transaction time during PITR ..... Last transaction end time is now logged at end of recovery and at each logged restart point (Simon) ...

On 1/7/10 12:53 PM, "Keaton Adams" <kadams@xxxxxxxxxxx> wrote:

We had WAL Log shipping (warm standby) working fine under 8.1.4 but under 8.3.7 we can’t get the slave to come up properly.  Nothing has changed in our process with regard to start_backup, rsync, stop_backup, bring up the warm standby server in continuous recovery mode, but the failover DB won’t start with the following error:

<2010-01-07 10:54:23 MST>LOG:  received immediate shutdown request

/mxl/var/pgsql/data/ha_copy.sh: line 103: 13976 Quit                    sleep 5

File /mxl/var/pgsql/data/stopslave found. Aborting Process.

<2010-01-07 10:54:28 MST>LOG:  could not open file "pg_xlog/00000001000000F6000000E9" (log file 246, segment 233): No such file or directory

<2010-01-07 10:54:28 MST>LOG:  redo done at F6/E8FFE378

<2010-01-07 10:54:28 MST>LOG:  last completed transaction was at log time 2009-12-28 10:18:04.893307-07

Waiting for log: 00000001000000F6000000E8

<2010-01-07 11:24:49 MST>FATAL:  could not restore file "00000001000000F6000000E8" from archive: return code 15

The log file in reference is very old and is not on the Master PG server in pg_xlogs and the “last completed transaction” can’t be right either.  Is this a bug or it is something we are doing wrong?

Thanks,

Keaton

psql (PostgreSQL) 8.3.7

contains support for command-line editing

RHEL 5 64 Bit

Linux ourservername 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux