Re: Index corruption after proper shut down

Strahinja Kustudić <strahinjak@xxxxxxxxxxx> · Fri, 22 Nov 2013 21:39:57 +0100

Sorry for a reply to myself, but does anyone have any idea what could be the problem? We would like to try do some testing from your suggestions to see what could cause this problem and how to mitigate it.

Regards,
Strahinja

On Fri, Nov 15, 2013 at 11:44 AM, Strahinja Kustudić <strahinjak@xxxxxxxxxxx> wrote:

Hi all,

Last week we migrated 200+ of our servers from one rack to another and the procedure was dead simple: power off server from the OS, unplug it, move it to a different rack, plug it in and start it. The problem was that after the boot some of the servers had corrupted indexes.

Servers are Dell PowerEdge R420 with H700 RAID controller with BBU,  Centos 5.9 x64 with Postgres 9.1.9 running on two Intel 330 120GB SSDSC2CT120 (one for data, and one for indexes) on XFS (noatime,nobarrier,noquota). Relevant Postgres configuration is:

wal_level = minimal
fsync = on
wal_sync_method = fdatasync
full_page_writes = on
synchronous_commit = off
wal_buffers = -1

Also we disabled disk write cache on all drives with the MegaCli64 utility, since the RAID controller should be the one caching since it has a BBU.

Does anyone have any idea, why could  we get index corruption?

Thanks in advance

Regards,
Strahinja