On Apr 29, 2008, at 3:20 AM, wstrzalka wrote:
What is the full pg_standby command string (restore_command=....) in
your recovery.conf. It sound's like you have pg_standby set to
delete
archived WALs and possibly have that a little too aggressive. Do you
have the -k flag set in your pg_standby call in your restore_command?
My restore command is:
-----------------------------------------------------------------------------------------
restore_command = 'pg_standby -l -d -s 5 -w 0 -t /tmp/
pgsql.promote_trigger.5432 ~postgres/incoming_wal %f %p %r 2>&1 |
logger -p local1.info -t pitr-standby'
-----------------------------------------------------------------------------------------
As you can see I didn't set -k to keep fixed number of WALs, but %r
parameter and the PostgreSQL controls number of keeped files
automatically (or at least it should)
Ok, I hadn't yet set up a standby on 8.3 and so hadn't seen that the
%r macro obviates the need for the -k flag. So...
The output from pg_standby:
------------------------------------
Trigger file : /tmp/pgsql.promote_trigger.5432
Waiting for WAL file : 00000001.history
WAL file path : /var/lib/pgsql/incoming_wal/
00000001.history
Restoring to... : pg_xlog/RECOVERYHISTORY
Sleep interval : 5 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/
00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history : 0000000100000001000000DB and later
running restore : OK
Trigger file : /tmp/pgsql.promote_trigger.5432
Waiting for WAL file : 0000000100000001000000D9.00000020.backup
WAL file path : /var/lib/pgsql/incoming_wal/
0000000100000001000000D9.00000020.backup
Restoring to... : pg_xlog/RECOVERYHISTORY
Sleep interval : 5 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/
0000000100000001000000D9.00000020.backup" "pg_xlog/RECOVERYHISTORY"
Keep archive history : 0000000100000001000000DB and later
running restore : OK
Note that here, from the start, postgres is telling the recovery
command that it only needs from 0000000100000001000000DB and on.
Here's where it gets to restoring the first actual log file:
Trigger file : /tmp/pgsql.promote_trigger.5432
Waiting for WAL file : 0000000100000001000000D9
WAL file path : /var/lib/pgsql/incoming_wal/
0000000100000001000000D9
Restoring to... : pg_xlog/RECOVERYXLOG
Sleep interval : 5 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/
0000000100000001000000D9" "pg_xlog/RECOVERYXLOG"
Keep archive history : 0000000100000001000000DB and later
running restore : OK
removing "/var/lib/pgsql/incoming_wal/0000000100000001000000D9"
removing "/var/lib/pgsql/incoming_wal/0000000100000001000000DA"
Since it says 'OK' but then fails my guess is that the order of
operations goes something along the lines of this (I could be totally
off):
1. Is /var/lib/pgsql/incoming/0000000100000001000000D9 present? -> OK
2. Clean up files older than 0000000100000001000000DB -> Delete /var/
lib/pgsql/incoming/0000000100000001000000D9 and /var/lib/pgsql/
incoming/0000000100000001000000DA
3. Restore /var/lib/pgsql/incoming/0000000100000001000000D9 -> This is
where it breaks.
So, the question is: why does does the server say that it only needs
0000000100000001000000DB and later? Did you clear out your pg_xlog
directory before starting up the standby?
Erik Jones
DBA | Emma®
erik@xxxxxxxxxx
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com