Some problem with warm standby server

Nico Sabbi <nsabbi@xxxxxxxxxxxxxxxxxxx> · Fri, 27 Apr 2007 12:31:26 +0200

Hi,
I have some doubts regarding the settings and the access procedure of 
warm standby servers:
- can autovacuum be safely enabled on the replicator?
- I'm using pg_standby (from cvs) that is generally working well as 
expected (logs are copied with
 scp); today I wanted to  temporarily stop the replication to verify 
some data to restart it later on, so
 I touched the trigger file, waited for the log to report "database 
ready", verified that the
 databases were actually up-to-date. All was fine, then I ran

 rm -f pg_xlog/* pg_xlog/archive_status/*
 mv recovery.done recovery.conf (the permissions were right)
 /etc/init.d/postgresql stop ; /etc/init.d/postgresql start

 the replication seemed to start:
----
---------------------------------------------------
LOG:  database system was shut down at 2007-04-27 12:16:13 CEST
LOG:  starting archive recovery
LOG:  restore_command = "/usr/local/bin/pg_standby -s 5 -w 0 -t 
/usr/local/postgres_replica/trigger  /usr/local/postgres_replica/log/ %f %p"
cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No 
such file or directory
cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No 
such file or directory
cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No 
such file or directory

then I updated the master with a batch of inserts, but after a while the 
slave stopped with
these messages:

LOG:  restored log file "000000010000000000000021" from archive
LOG:  record with zero length at 0/21000048
LOG:  invalid primary checkpoint record
LOG:  restored log file "000000010000000000000020" from archive
LOG:  restored log file "000000010000000000000021" from archive
LOG:  invalid resource manager ID in secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 19619) was terminated by signal 6
LOG:  aborting startup due to startup process failure

What did I do wrong? Is there any other procedure to follow to restart a 
stopped replication?
Thanks,
   Nico