Re: How to continue streaming replication after this error?

Haribabu Kommi <kommi.haribabu@xxxxxxxxx> · Mon, 24 Feb 2014 13:43:18 +1100

On Sat, Feb 22, 2014 at 1:21 PM, Torsten Förtsch <torsten.foertsch@xxxxxxx> wrote:

On 21/02/14 09:17, Torsten Förtsch wrote:

> one of our streaming replicas died with

>

> 2014-02-21 05:17:10 UTC PANIC:  heap2_redo: unknown op code 32

> 2014-02-21 05:17:10 UTC CONTEXT:  xlog redo UNKNOWN

> 2014-02-21 05:17:11 UTC LOG:  startup process (PID 1060) was terminated

> by signal 6: Aborted

> 2014-02-21 05:17:11 UTC LOG:  terminating any other active server processes

> 2014-02-21 05:17:11 UTC WARNING:  terminating connection because of

> crash of another server process

> 2014-02-21 05:17:11 UTC DETAIL:  The postmaster has commanded this

> server process to roll back the current transaction and exit, because

> another server process exited abnormally and possibly corrupted shared

> memory.

> 2014-02-21 05:17:11 UTC HINT:  In a moment you should be able to

> reconnect to the database and repeat your command.

Any idea what that means?

I have got a second replica dying with the same symptoms.

The Xlog record seems to be corrupted. The op code 32 represents XLOG_HEAP2_FREEZE_PAGE, the code exists to handle it.
Don't know why the system is not able to recognize the op code?  Can you try pg_xlogdump of the corrupted WAL file?

Keep the data folder for problem investigation. As it seems some of kind corruption, you need to take a fresh base backup to continue.  

Regards,Hari Babu

Fujitsu Australia