Failover Testing Failures: invalid resource manager ID in primary checkpoint record

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


PostgreSQL 12.13 (PGDG packages) in a streaming replication configuration. pgBackrest 2.43 used for WAL archiving and DB backups to cloud storage

I'm testing and documenting a DR exercise process where I:
  1. Cleanly shutdown PG on the primary
  2. Promote the PG DR replica
  3. Place the standby.signal file on the old primary and start it up (presumes no other configurations need changing, primary_conninfo etc were already set).
My hope is I could just start the old primary / new replica if it was cleanly shutdown prior to promoting the replica. However when I try to start up that new replica, I'm met with:

LOG:  restored log file "00000002000000B70000005A" from archive
LOG:  invalid resource manager ID in primary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 17660) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
LOG:  database system is shut down

It doesn't appear any WAL files are missing as it finds all the files that it asks for. Am I missing a piece here?

My hope is to avoid having to do a restore to rebuild the new replica.

Aside for those that may be asking: most of these databases do not have data checksums enabled so pg_rewind isn't in the picture. Although I'm reading now that we could enable the wal_log_hints parameter as an alternative. I'm leery of the overhead but if it's the same overhead that would be done with data checksums then I guess there would be nothing lost when we eventually enable them.

Don Seiler

[Index of Archives]     [Postgresql Home]     [Postgresql General]     [Postgresql Performance]     [Postgresql PHP]     [Postgresql Jobs]     [PHP Users]     [PHP Databases]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Yosemite Forum]

  Powered by Linux