Re: trying to run PITR recovery

"Simon Riggs" <simon@xxxxxxxxxxxxxxx> · Fri, 30 Mar 2007 18:10:30 +0100

On Fri, 2007-03-30 at 12:40 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@xxxxxxxxxxxxxxx> writes:
> > I think there is a problem here. If we stop before the end of logs we
> > should be incrementing the timeline id.
> 
> There is no good reason here to think that we have stopped before the
> end of logs, and I don't think I want the code bumping the timeline ID
> on every crash restart.

The timeline is a protection against confusing ourselves when we have
two log files both called the same thing. In the OP's case, there were
clearly unapplied log files that end up as duplicates. I can see the
difficulty in knowing whether or not to bump the timeline id.

At very least we need to document this, since if the manual's advice
were taken "The archive command should generally be designed to refuse
to overwrite any pre-existing archive file." then the OP's system would
start throwing errors when the first xlog fills after the recovered
system re-enters normal operation.

We should say: 

"During recovery it is possible, if you're unlucky, that one of the WAL
files has been damaged. If so, recovery will stop at the point at which
the damage has occurred. It is probable that WAL files higher than the
damaged WAL file exist in the archive. If that is the case, you may need
to begin archiving to a different location, or move the earlier WAL
files out of the archive, to allow the newly restored server to continue
archive operations correctly. If you don't, the server will operate
normally but further archiving may not occur correctly. Take good care
of your archived WAL files or better still take two copies.". 

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com