On 29 Kwi, 17:16, e...@xxxxxxxxxx (Erik Jones) wrote: > On Apr 29, 2008, at 3:20 AM, wstrzalka wrote: > > > > >> What is the full pg_standby command string (restore_command=....) in > >> your recovery.conf. It sound's like you have pg_standby set to > >> delete > >> archived WALs and possibly have that a little too aggressive. Do you > >> have the -k flag set in your pg_standby call in your restore_command? > > > My restore command is: > > ----------------------------------------------------------------------------------------- > > restore_command = 'pg_standby -l -d -s 5 -w 0 -t /tmp/ > > pgsql.promote_trigger.5432 ~postgres/incoming_wal %f %p %r 2>&1 | > > logger -p local1.info -t pitr-standby' > > ----------------------------------------------------------------------------------------- > > > As you can see I didn't set -k to keep fixed number of WALs, but %r > > parameter and the PostgreSQL controls number of keeped files > > automatically (or at least it should) > > Ok, I hadn't yet set up a standby on 8.3 and so hadn't seen that the > %r macro obviates the need for the -k flag. So... > > The output from pg_standby: > ------------------------------------ > Trigger file : /tmp/pgsql.promote_trigger.5432 > Waiting for WAL file : 00000001.history > WAL file path : /var/lib/pgsql/incoming_wal/ > 00000001.history > Restoring to... : pg_xlog/RECOVERYHISTORY > Sleep interval : 5 seconds > Max wait interval : 0 forever > Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/ > 00000001.history" "pg_xlog/RECOVERYHISTORY" > Keep archive history : 0000000100000001000000DB and later > running restore : OK > > Trigger file : /tmp/pgsql.promote_trigger.5432 > Waiting for WAL file : 0000000100000001000000D9.00000020.backup > WAL file path : /var/lib/pgsql/incoming_wal/ > 0000000100000001000000D9.00000020.backup > Restoring to... : pg_xlog/RECOVERYHISTORY > Sleep interval : 5 seconds > Max wait interval : 0 forever > Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/ > 0000000100000001000000D9.00000020.backup" "pg_xlog/RECOVERYHISTORY" > Keep archive history : 0000000100000001000000DB and later > running restore : OK > > Note that here, from the start, postgres is telling the recovery > command that it only needs from 0000000100000001000000DB and on. > > Here's where it gets to restoring the first actual log file: > > Trigger file : /tmp/pgsql.promote_trigger.5432 > Waiting for WAL file : 0000000100000001000000D9 > WAL file path : /var/lib/pgsql/incoming_wal/ > 0000000100000001000000D9 > Restoring to... : pg_xlog/RECOVERYXLOG > Sleep interval : 5 seconds > Max wait interval : 0 forever > Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/ > 0000000100000001000000D9" "pg_xlog/RECOVERYXLOG" > Keep archive history : 0000000100000001000000DB and later > running restore : OK > removing "/var/lib/pgsql/incoming_wal/0000000100000001000000D9" > removing "/var/lib/pgsql/incoming_wal/0000000100000001000000DA" > > Since it says 'OK' but then fails my guess is that the order of > operations goes something along the lines of this (I could be totally > off): > > 1. Is /var/lib/pgsql/incoming/0000000100000001000000D9 present? -> OK > 2. Clean up files older than 0000000100000001000000DB -> Delete /var/ > lib/pgsql/incoming/0000000100000001000000D9 and /var/lib/pgsql/ > incoming/0000000100000001000000DA > 3. Restore /var/lib/pgsql/incoming/0000000100000001000000D9 -> This is > where it breaks. > > So, the question is: why does does the server say that it only needs > 0000000100000001000000DB and later? Did you clear out your pg_xlog > directory before starting up the standby? > Yes - the param passed to %r looks bad from start. Generally I like the %r because I don't need to worry if there are enough WALs to continue recovery after standby reboot and I don't keep many of the files at the same time, but I think something is wrong with it. And answering your question - I don't delete any files before standby start. So it looks like a bug for me - probably I should submit it to pgsql.bugs - unfortunatelly ( or fortunatelly :D ) my test environment is production now so I'll not be able to reproduce it easily.