Re: PITR problem

wstrzalka <wstrzalka@xxxxxxxxx> · Wed, 30 Apr 2008 02:32:35 -0700 (PDT)

On 29 Kwi, 17:16, e...@xxxxxxxxxx (Erik Jones) wrote:
> On Apr 29, 2008, at 3:20 AM, wstrzalka wrote:
>
>
>
> >> What is the full pg_standby command string (restore_command=....) in
> >> your recovery.conf.  It sound's like you have pg_standby set to
> >> delete
> >> archived WALs and possibly have that a little too aggressive.  Do you
> >> have the -k flag set in your pg_standby call in your restore_command?
>
> > My restore command is:
> > -----------------------------------------------------------------------------------------
> > restore_command = 'pg_standby -l -d -s 5 -w 0 -t /tmp/
> > pgsql.promote_trigger.5432 ~postgres/incoming_wal %f %p %r 2>&1 |
> > logger -p local1.info -t pitr-standby'
> > -----------------------------------------------------------------------------------------
>
> > As you can see I didn't set -k to keep fixed number of WALs, but %r
> > parameter and the PostgreSQL controls number of keeped files
> > automatically (or at least it should)
>
> Ok, I hadn't yet set up a standby on 8.3 and so hadn't seen that the
> %r macro obviates the need for the -k flag.  So...
>
> The output from pg_standby:
> ------------------------------------
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 00000001.history
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 00000001.history
> Restoring to...          : pg_xlog/RECOVERYHISTORY
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 00000001.history" "pg_xlog/RECOVERYHISTORY"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
>
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 0000000100000001000000D9.00000020.backup
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9.00000020.backup
> Restoring to...          : pg_xlog/RECOVERYHISTORY
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9.00000020.backup" "pg_xlog/RECOVERYHISTORY"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
>
> Note that here, from the start, postgres is telling the recovery
> command that it only needs from 0000000100000001000000DB and on.
>
> Here's where it gets to restoring the first actual log file:
>
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 0000000100000001000000D9
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9
> Restoring to...          : pg_xlog/RECOVERYXLOG
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9" "pg_xlog/RECOVERYXLOG"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
> removing "/var/lib/pgsql/incoming_wal/0000000100000001000000D9"
> removing "/var/lib/pgsql/incoming_wal/0000000100000001000000DA"
>
> Since it says 'OK' but then fails my guess is that the order of
> operations goes something along the lines of this (I could be totally
> off):
>
> 1. Is /var/lib/pgsql/incoming/0000000100000001000000D9 present? -> OK
> 2. Clean up files older than 0000000100000001000000DB -> Delete /var/
> lib/pgsql/incoming/0000000100000001000000D9 and /var/lib/pgsql/
> incoming/0000000100000001000000DA
> 3. Restore /var/lib/pgsql/incoming/0000000100000001000000D9 -> This is
> where it breaks.
>
> So, the question is:  why does does the server say that it only needs
> 0000000100000001000000DB and later?  Did you clear out your pg_xlog
> directory before starting up the standby?
>

Yes - the param passed to %r looks bad from start.
Generally I like the %r because I don't need to worry if there are
enough WALs to continue recovery after standby reboot and I don't keep
many of the files at the same time, but I think something is wrong
with it.
And answering your question - I don't delete any files before standby
start.

So it looks like a bug for me - probably I should submit it to
pgsql.bugs - unfortunatelly ( or fortunatelly :D ) my test environment
is production now so I'll not be able to reproduce it easily.