Re: Critical failure of standby

Jeff Janes <jeff.janes@xxxxxxxxx> · Sat, 20 Aug 2016 12:12:30 -0700

On Mon, Aug 15, 2016 at 7:23 PM, James Sewell <james.sewell@xxxxxxxxxxxx> wrote:
Those are all good questions.
Essentially this is a situation where DR is network separated from Prod - so I would expect the archive command to fail.

archive_command or restore_command?  I thought it was restore_command.

 I'll have to check the script it must not be passing the error back through to PostgreSQL.

This still shouldn't cause database corruption though right? - it's just not getting WALs.

If the WAL it does have is corrupt, and it can't replace that with a good copy because the command is failing, then what else is it going to do?

If the original WAL transfer got interrupted mid-stream, then you will have a bad record in the middle of the WAL.  If by some spectacular stroke of bad luck, the CRC checksum on that bad record happens to collide, then it will try to decode that bad record.

Cheers,

Jeff