Re: PITR restore hot standby

"pgsql-general@xxxxxxxxxxxxxxxx" <pgsql-general@xxxxxxxxxxxxxxxx> · Thu, 02 Jun 2005 01:13:36 +0300

thanks for your reply,

I did not presume that it is a bug, I am interested to know how can I 
setup a "hot standby" (if is some more documentation available).

===QUOTE FROM DOCS===
If we continuously feed the series of WAL files to another machine that 
has been loaded with the same base backup file, we have a "hot standby" 
system: at any point we can bring up the second machine and it will have 
a nearly-current copy of the database.
===QUOTE FROM DOCS===

I have on average about 30-50GB of WAL files (gzip compressed) and a 
complete restore takes about 1-3 hours. From your answer I understand 
that once a restore is complete it can not be "resumed", the only way to 
add a new WAL file is to restore the original backup and to "play" all 
WAL files (including the new one).

In this case do you know what is the timeout on the "restore_command" 
command (or where I can find it in documentation).
What will happen if I make the restore command wait for a new WAL file 
to become available ?
And, most important, how do I distinguish between WAL files that should 
be in the archive from those that should not be in the archive, because 
I see in the documentation that "The command /will/ be asked for log 
files that are not present in the archive; it must return nonzero when 
so asked" ?

Are timelines helping me in this case ?

thanks again,
Razvan Radu
CORE Technologies

Simon Riggs wrote:

On Mon, 2005-05-23 at 16:17 +0300, Postgres General wrote:

I am trying to setup a "hot standby" on a second machine.
I have created a "recovery.conf" file and started a restore with logs 
from the primary machine. everything was OK.

now a have new transaction logs generated by the primary machine and I 
want to "play" them on the secondary one. I have stopped postgres, 
recreated "recovery.conf", started postgres and I get the following error:

----------------------------------------------------------------
LOG:  database system was shut down at 2005-05-23 05:19:34 PDT
LOG:  starting archive recovery
LOG:  restore_command = "/usr/cbmp/core/bin/restore_pg_tlog %f %p"
LOG:  restored log file "0000000100000008000000C4" from archive
LOG:  invalid resource manager ID 53 at 8/C4FFFEF8
LOG:  invalid primary checkpoint record
LOG:  restored log file "0000000100000008000000C4" from archive
LOG:  invalid resource manager ID 52 at 8/C4FFFEBC
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 18297) was terminated by signal 6
LOG:  aborting startup due to startup process failure
LOG:  logger shutting down
----------------------------------------------------------------

what is the procedure for creating a "hot standby" (continuously feeding 
a series of WAL files created by the primary machine into the secondary 
one) ?

Sounds like you've tried to do two recoveries on the same server. The
control file doesn't match the log files you've provided to it in your
script/program, so the recovery has not been setup correctly. This
doesn't look like a bug...

The backup system doesn't know about the primary, so just let the log
files stream in and it will work. Once the recovery terminates normally,
you cannot restart it. You need to perform the base backup again and
then begin streaming the files through once more.

If you have more information, perhaps it would be possible to say more.

Professional support is available. Some enhancements should be available
in 8.1, as well as further documentation.

Best Regards, Simon Riggs
http://www.2ndquadrant.com

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)