On Fri, 2007-04-27 at 12:31 +0200, Nico Sabbi wrote: > I have some doubts regarding the settings and the access procedure of > warm standby servers: > - can autovacuum be safely enabled on the replicator? Not a relevant question. The standby isn't "up", so no SQL can be executed on the standby, so it never needs vacuuming. The autovacuum process doesn't start until recovery finishes. That is why it is Warm rather than Hot standby. Florian Pflug is working on allowing read-only queries to execute on the standby, so we're hopeful of further enhancements in the next release. > - I'm using pg_standby (from cvs) that is generally working well as > expected (logs are copied with > scp); today I wanted to temporarily stop the replication to verify > some data to restart it later on, so > I touched the trigger file, waited for the log to report "database > ready", verified that the > databases were actually up-to-date. All was fine, then I ran > > rm -f pg_xlog/* pg_xlog/archive_status/* > mv recovery.done recovery.conf (the permissions were right) > /etc/init.d/postgresql stop ; /etc/init.d/postgresql start > > the replication seemed to start: > ---- > --------------------------------------------------- > LOG: database system was shut down at 2007-04-27 12:16:13 CEST > LOG: starting archive recovery > LOG: restore_command = "/usr/local/bin/pg_standby -s 5 -w 0 -t > /usr/local/postgres_replica/trigger /usr/local/postgres_replica/log/ %f %p" > cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No > such file or directory > cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No > such file or directory > cp: cannot stat `/usr/local/postgres_replica/log//00000001.history': No > such file or directory Looks like there's an issue with double slashes on the archive filename, which I will fix. This probably isnt the problem though. > then I updated the master with a batch of inserts, but after a while the > slave stopped with > these messages: > > LOG: restored log file "000000010000000000000021" from archive > LOG: record with zero length at 0/21000048 > LOG: invalid primary checkpoint record > LOG: restored log file "000000010000000000000020" from archive > LOG: restored log file "000000010000000000000021" from archive > LOG: invalid resource manager ID in secondary checkpoint record > PANIC: could not locate a valid checkpoint record > LOG: startup process (PID 19619) was terminated by signal 6 > LOG: aborting startup due to startup process failure Please run pg_controldata to print out the control file. Backup all the files in case we need to inspect them. What was the ending log sequence number (e.g. x/xxxx) from the previous recovery? I'll see if I can re-create this. > What did I do wrong? Is there any other procedure to follow to restart a > stopped replication? You're right, using the trigger is not the right way to stop/start the standby. Just stop/start the standby server normally. The trigger means that you'd like to perform a failover. There is a patch not yet applied which will make a new version of pg_standby. pg_standby's official status right now is beta, so please expect, look for and report any issues you find. Thanks. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com