Two days ago, we started getting panics on a hot-standby replica as follows:
2015-10-24 14:16:46.489 UTC PANIC: corrupted page pointers: lower = 17, upper = 0, special = 8176
2015-10-24 14:16:46.490 UTC CONTEXT: xlog redo unlink_page: rel 1663/16416/254063; dead 11796080; left 1365037; right 3024097; btpo_xact 64542957; leaf 2456241; leafleft 11130443; leafright 1350594; topparent 4294967295
2015-10-26 04:51:40.530 UTC PANIC: corrupted page pointers: lower = 17, upper = 0, special = 8176
2015-10-26 04:51:40.530 UTC CONTEXT: xlog redo unlink_page: rel 1663/16416/254063; dead 9922828; left 2449142; right 3415026; btpo_xact 64982371; leaf 2290440; leafleft 5120238; leafright 1903321; topparent 4294967295
2015-10-26 10:24:02.613 UTC PANIC: corrupted page pointers: lower = 17, upper = 0, special = 8176
2015-10-26 10:24:02.613 UTC CONTEXT: xlog redo unlink_page: rel 1663/16416/401628; dead 2348571; left 2348281; right 2351431; btpo_xact 65010718; leaf 2348740; leafleft 2348434; leafright 2351568; topparent 4294967295
The replica is running on a dedicated EC2 instance, and has been running without any problems for several months. The build version is 9.4.4-1.pgdg14.04+1 from the apt repository, running on Ubuntu 14.04 Trusty. The database is around 440GB, and is under constant moderate read-only load (100-1000 queries per second).
There have been no issues with the master database, nor have there been any database shutdowns other than the panics.
I would be very grateful for any insights as to what may have caused this, and how best to recover stable operation.
Best regards,
Michael Robinson