Re: are WAL file segment boundaries a point of consistency?

Jeff Janes <jeff.janes@xxxxxxxxx> · Mon, 9 Sep 2013 10:22:16 -0700

On Fri, Sep 6, 2013 at 1:26 PM, John Lumby <johnlumby@xxxxxxxxxxx> wrote:
> We use logshipping replication,    and have recently noticed a nasty bug
>  where, in certain very rare cases, the primary archive_command program
> will fail to send the WAL file to the standby but report good return code 0 to postgresql.
> In such cases,  if the standby then  triggers its termination of recovery mode,
> it will come up in normal accessible mode but missing the log records from that last WAL file.
>
> This is a bug in our code which we will fix,  but I am wondering if it means there is a possibility
> of worse than missing some updates.      I.e. could it result in this was-standby cluster now having
> a corrupt database  (e.g. an index entry with no matching heap slot or something like that  -  or worse)?

As long as the standby ever reached consistency in the first place,
then it should not lose it due to this issue. Once consistency is
reached, changes to the data files are driven only by replay of the
WAL records, and those should only take the database from one
consistent state to another.

Where you risk corruption is if the problem occured while you are
taking the base backup.  Then some of the base files that were copied
might already have data in them which is from the "future", but that
future cannot be reached because recovery stops early due to the lost
file.  The database should detect this situation and refuse to start,
forcing you to retake the base backup or use an earlier one.  But
there were known bugs in this general area, some fixed in 9.2.3.

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general