On 2/5/18 9:06 AM, Ray Stell wrote:
I built a standby with 9.4.12 and about a day later the standby
crashed with this:
2018-02-02 16:20:44 EST,0, WARNING: page 1347460 of relation
base/16391/16414 is uninitialized
2018-02-02 16:20:44 EST,0, CONTEXT: xlog redo visible: rel
1663/16391/16414; blk 1347460
2018-02-02 16:20:44 EST,0, PANIC: WAL contains references to invalid
pages
2018-02-02 16:20:44 EST,0, CONTEXT: xlog redo visible: rel
1663/16391/16414; blk 1347460
2018-02-02 16:20:44 EST,0, LOG: startup process (PID 24057) was
terminated by signal 6: Aborted
2018-02-02 16:20:44 EST,0, LOG: terminating any other active server
processes
Any hints to where the corruption begins? I don't see any disk i/o
issues. Not sure what to look for in the release notes,
but I'll try to patch asap, but that is difficult to get done
politically.
I begin to wonder about pg_basebackup in this old version. I rebuilt
the stby again and this time when I fired up the stby I get:
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 2F45/1F4B7F8
FATAL: could not access status of transaction 4053124744
DETAIL: Could not read from file "pg_clog/0F19" at offset 90112: Success.
CONTEXT: xlog redo commit: 2018-02-05 11:35:54.291398-05
LOG: startup process (PID 130590) exited with exit code 1
LOG: terminating any other active server processes
right or wrong, I rsync-ed gp_clog and it recovered. Can you use
pg_basebackup from a more current patch_level on 9.4.12 cluster?