Willy-Bas Loos <willybas@xxxxxxxxx> writes: > However, our test database cluster seems to be broken. > When i try to start the cluster it says: > 2015-11-16 15:06:35 CET db: ip: us: LOG: database system was interrupted > while in recovery at 2015-11-16 13:05:41 CET > 2015-11-16 15:06:35 CET db: ip: us: HINT: This probably means that some > data is corrupted and you will have to use the last backup for recovery. > 2015-11-16 15:06:35 CET db: ip: us: LOG: database system was not properly > shut down; automatic recovery in progress > 2015-11-16 15:06:35 CET db: ip: us: PANIC: unexpected pageaddr A/6E786000 > in log segment 000000010000000A00000084, offset 7888896 > 2015-11-16 15:06:35 CET db: ip: us: LOG: startup process (PID 11634) was > terminated by signal 6: Aborted The offsets match up (7888896 = 0x786000) so evidently what you've got here is a page in a WAL file that is still from a previous cycle of life, ie this WAL file used to be 000000010000000A0000006E and has been recycled, but this particular page hasn't been rewritten yet. The fact that this is a PANIC and not just normal end of recovery implies that the pg_control checkpoint WAL-replay-start address is pointing here. In short, therefore, what you've got here is an indication that a checkpoint update of pg_control happened before the corresponding WAL had been flushed to disk. Which means that your filesystem stack is not honoring fsync semantics properly, because PG would certainly have fsync'd the WAL first (unless you turned off fsync ...) So I'd say your test database served its purpose as a coal mine canary, by letting you know that there's something rotten in your filesystem and/or RAID hardware. I'd suggest doing actual plug-pull tests when you think you've got that resolved, rather than just taking it on faith that it works. > The cluster is on an OpenVZ container and runs Ubuntu 14.04 > The postgresql version is 9.3 > 3 other clusters on the same container are fine. > We use a hardware RAID10 of SATA disks with a BBU (and writeBack mode) Right offhand, I wonder whether the weak link in that isn't OpenVZ. Virtualization technologies have a long and ugly reputation for not providing strong filesystem-integrity guarantees. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general