Re: uninitialized page in standby recovery

Ray Stell <stellr@xxxxxx> · Mon, 5 Feb 2018 14:42:28 -0500

On 2/5/18 9:06 AM, Ray Stell wrote:

I built a standby with 9.4.12 and about a day later the standby 
crashed with this:

2018-02-02 16:20:44 EST,0, WARNING:  page 1347460 of relation 
base/16391/16414 is uninitialized
2018-02-02 16:20:44 EST,0, CONTEXT:  xlog redo visible: rel 
1663/16391/16414; blk 1347460
2018-02-02 16:20:44 EST,0, PANIC:  WAL contains references to invalid 
pages
2018-02-02 16:20:44 EST,0, CONTEXT:  xlog redo visible: rel 
1663/16391/16414; blk 1347460
2018-02-02 16:20:44 EST,0, LOG:  startup process (PID 24057) was 
terminated by signal 6: Aborted
2018-02-02 16:20:44 EST,0, LOG:  terminating any other active server 
processes

Any hints to where the corruption begins?  I don't see any disk i/o 
issues.  Not sure what to look for in the release notes,

but I'll try to patch asap, but that is difficult to get done 
politically.

I begin to wonder about pg_basebackup in this old version.  I rebuilt 
the stby again and this time when I fired up the stby I get:

LOG:  database system was not properly shut down; automatic recovery in 
progress
LOG:  redo starts at 2F45/1F4B7F8
FATAL:  could not access status of transaction 4053124744
DETAIL:  Could not read from file "pg_clog/0F19" at offset 90112: Success.
CONTEXT:  xlog redo commit: 2018-02-05 11:35:54.291398-05
LOG:  startup process (PID 130590) exited with exit code 1
LOG:  terminating any other active server processes

right or wrong, I rsync-ed gp_clog and it recovered.  Can you use 
pg_basebackup from a more current patch_level on 9.4.12 cluster?