On Apr 18, 2006, at 12:30 PM, Thomas F. O'Connell wrote:
So there are currently three separate relations exhibiting invalid
page errors.
This box is a Debian 3.1 box running a custom Linux 2.6.10 #6 SMP
kernel. Postgres 8.1.3 was compiled from source. pgpool 3.0.1, also
built from source, is used by some parts of the application layer.
The system is running on an ext3 filesystem, WAL is on a 4-disk
RAID 10 running JFS, and data is on a 12-disk RAID 10 running JFS.
I'm not seeing any signs of apparent kernel or hardware errors in
the system and kernel logs.
I take back the lack of errors. megamgr is now reporting 5 (!) failed
drives on a single channel in the RAID 10 for data. The RAID card is
a MegaRAID SCSI 320-2X.
I would've expected the RAID to protect postgres from the possibility
of data corruption, but I guess not.
In any event, we're working on replacing the failed drives. After the
RAID is rebuilt, though, the focus will be on data. Is my best bet to
restore the corrupted relations, or can I repair them somehow?
And I'm still concerned about whether postgres will recover if I stop
it at this point, so I'm working on contingency plans for leaving
postgres online, turning off the application, restoring the tables
while nothing is accessing postgres, and then restarting the
application. Is there a safer/better course of action available?
--
Thomas F. O'Connell
Database Architecture and Programming
Sitening, LLC
http://www.sitening.com/
3004 B Poston Avenue
Nashville, TN 37203-1314
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)