On 11/6/13, 12:26 PM, Jeff Ross wrote:
On 11/6/13, 11:32 AM, Jeff Janes
wrote:
Hi Jeff,
Thanks for the reply. Oops, I copied one of the many changes to
the script, but not the one with the rsync to copy /wal from the
primary to the standby.
I should have mentioned that wal archiving is setup and working
from the primary to the standby. It saves wal both on the locally
on the primary and remotesly on the standby.
I moved the rsync line to copy wal from primary to secondary after
pg_stop_backup but I'm still getting the same panic on the
standby.
Here's the real, honest version of the script I use to start the
hot standby:
_postgresql@nirvana:/var/postgresql $ cat start_hot_standby.sh
#!/bin/sh
backup_label=wykids_`date +%Y-%m-%d`
#remove any existing wal files on the secondary
ssh dukkha.internal "rm -rf /wal/*"
ssh dukkha.internal sudo /usr/local/bin/svc -d
/service/postgresql.5432
psql -c "select pg_start_backup('$backup_label');"
template1
rsync \
--copy-links \
--delete \
--exclude=backup_label \
--exclude=postgresql.conf \
--exclude=recovery.done \
-e ssh -avz /var/postgresql/data.93.5432/ \
dukkha.internal:/var/postgresql/data.93.5432/
ssh dukkha.internal "rm -f
/var/postgresql/data.93.5432/pg_xlog/*"
ssh dukkha.internal "rm -f
/var/postgresql/data.93.5432/pg_xlog/archive_status/*"
ssh dukkha.internal "rm -f
/var/postgresql/data.93.5432/pg_log/*"
ssh dukkha.internal "rm -f
/var/postgresql/data.93.5432/postmaster.pid"
ssh dukkha.internal "ln -s /var/postgresql/recovery.conf
/var/postgresql/data.93.5432/recovery.conf"
psql -c "select pg_stop_backup();" template1
rsync -e ssh -avz /wal/ dukkha.internal:/wal/
ssh dukkha.internal sudo /usr/local/bin/svc -u
/service/postgresql.5432
Here are the logs on the standby after running the above:
2013-11-06 11:56:30.792461500 <%> LOG: database system
was interrupted; last known up at 2013-11-06 11:52:22 MST
2013-11-06 11:56:30.800685500 <%> LOG: entering
standby mode
2013-11-06 11:56:30.800891500 <%> LOG: invalid
primary checkpoint record
2013-11-06 11:56:30.800930500 <%> LOG: invalid
secondary checkpoint record
2013-11-06 11:56:30.801004500 <%> PANIC: could not
locate a valid checkpoint record
Jeff
My apologies to Jeff--I'd missed his in-line comment above that I
should *not* exclude the backup label from the rsync of the primary
to the standby. As soon as I removed that exclusion and with his
other suggested change that I should copy the /wal from the primary
to the standby after pg_stop_backup, streaming replication started
on the standby exactly as it should.
Logs from the standby:
2013-11-07 09:21:15.273712500 <%> LOG: database system
was interrupted; last known up at 2013-11-07 09:16:05 MST
2013-11-07 09:21:15.286834500 <%> LOG: entering
standby mode
2013-11-07 09:21:16.873654500 <%> LOG: restored log
file "000000010000000000000050" from archive
2013-11-07 09:21:16.936355500 <%> LOG: redo starts
at 0/50000024
2013-11-07 09:21:17.129718500 <%> LOG: consistent
recovery state reached at 0/50036D6C
2013-11-07 09:21:17.131933500 <%> LOG: database
system is ready to accept read only connections
2013-11-07 09:21:17.136856500 cp:
/wal/000000010000000000000051: No such file or directory
2013-11-07 09:21:17.194811500 <%> LOG: started
streaming WAL from primary at 0/51000000 on timeline 1
Jeff Ross
|