I'm running PostgreSQL 9.3.5 on Ubuntu 14.04 on x86_64. The database
directory is on linux mdadm RAID10, using 4 4TB disks and a far=2
layout. While the RAID tolerates 1 drive failure nicely, I had the
misfortune of 2 drives failing consecutively, one of which had many
sectors reallocated and began failing SMART criteria. That one is out
now. As a result of this some files were corrupted.
I was getting the following errors on some tables:
ERROR: could not read block 0 in file "base/27810/3995569":
Input/output error
but after dropping those tables the errors are gone.
The situation appears to be stable now, but upon running REINDEX and
VACUUM on one of the databases, I get the following:
WARNING: relation "pg_attrdef" TID 1/1: OID is invalid
WARNING: relation "pg_attrdef" TID 1/2: OID is invalid
WARNING: relation "pg_attrdef" TID 1/3: OID is invalid
...
Should I drop the database and restore it from a backup? My most recent
backup is from late September, so I would lose some data. I also backed
up what I could as soon as the disks started giving errors, but I don't
know if I can trust that.
Should I drop the entire cluster?
Regarding hardware, I'm going to add hot standby drives to prevent this
from happening in the future.
Thanks in advance for your advice.
Regards,
Gabriel
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general