I fear I have a corrupted database, and I'm not sure what to do.
Environment:
Windows Server 2003
8GB RAM
Dual processor, quad core 2.6Ghz
Postgres 8.2.3 (The IT dept wants to upgrade to 8.2.9, but they are
asking me what to do about this corrupt database before they proceed)
The database files and logs are stored on a SAN drive
2008-08-23 06:57:06 FATAL: could not create sigchld waiter thread:
error code 1816
*** ack! 13 hour hole! What the...?
2008-08-23 20:00:27 ERROR: xlog flush request E0/293CF278 is not
satisfied --- flushed only to E0/21B1B7F0
2008-08-23 20:00:27 CONTEXT: writing block 94218 of relation
16712/16713/16725
2008-08-23 20:04:36 DETAIL: Multiple failures --- write error may be
permanent.
2008-08-23 20:04:36 ERROR: xlog flush request E0/4FC5BEB8 is not
satisfied --- flushed only to E0/21B9E270
2008-08-23 20:04:36 CONTEXT: writing block 81033 of relation
16712/16713/16725
2008-08-23 20:04:36 STATEMENT: BEGIN TRANSACTION; ... just a normal SQL
stored proc...
2008-08-23 20:04:36 DETAIL: Multiple failures --- write error may be
permanent.
2008-08-23 20:04:36 ERROR: xlog flush request E0/314D8248 is not
satisfied --- flushed only to E0/21B9E358
2008-08-23 20:04:36 CONTEXT: writing block 371418 of relation
16712/16713/16719
2008-08-23 20:04:36 STATEMENT: BEGIN TRANSACTION;... just a normal SQL
stored proc...
repeats for quite a while.
A few days later, after a restart, we are seeing these showing up quite
often:
2008-08-26 11:59:42 FATAL: the database system is starting up
2008-08-26 11:59:42 FATAL: the database system is starting up
2008-08-26 11:59:43 FATAL: the database system is starting up
2008-08-26 11:59:43 FATAL: the database system is starting up
2008-08-26 11:59:43 FATAL: the database system is starting up
2008-08-26 11:59:43 LOG: database system is ready
2008-08-26 11:59:55 PANIC: right sibling's left-link doesn't match
2008-08-26 11:59:55 STATEMENT: BEGIN TRANSACTION;INSERT INTO ...SQL
scrubbed...
This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.
2008-08-26 11:59:55 LOG: server process (PID 2228) exited with exit code 3
2008-08-26 11:59:55 LOG: terminating any other active server processes
2008-08-26 11:59:55 LOG: all server processes terminated; reinitializing
2008-08-26 11:59:55 LOG: database system was interrupted at 2008-08-26
11:59:43 Pacific Daylight Time
2008-08-26 11:59:55 LOG: checkpoint record is at E2/F88B6C0
2008-08-26 11:59:55 LOG: redo record is at E2/F88B6C0; undo record is
at 0/0; shutdown TRUE
2008-08-26 11:59:55 LOG: next transaction ID: 0/396816257; next OID: 58100
2008-08-26 11:59:55 LOG: next MultiXactId: 3; next MultiXactOffset: 5
2008-08-26 11:59:55 LOG: database system was not properly shut down;
automatic recovery in progress
2008-08-26 11:59:55 LOG: redo starts at E2/F88B710
2008-08-26 11:59:55 LOG: record with zero length at E2/F984928
2008-08-26 11:59:55 LOG: redo done at E2/F9848F8
2008-08-26 11:59:55 FATAL: the database system is starting up
2008-08-26 11:59:56 FATAL: the database system is starting up
2008-08-26 11:59:56 FATAL: the database system is starting up
2008-08-26 11:59:56 FATAL: the database system is starting up
2008-08-26 11:59:56 FATAL: the database system is starting up
2008-08-26 11:59:56 FATAL: the database system is starting up
2008-08-26 11:59:56 FATAL: the database system is starting up
2008-08-26 11:59:56 LOG: database system is ready
That section is repeating over and over. Oddly enough, the system
actually seems to be running mostly. I need to do some diagnostics of
our app to see what is going on at that layer and what is and isn't working.
I found an article online with a similar problem, but no resolution:
http://www.mydatabasesupport.com/forums/postgresql/399079-general-failing-recover-after-panic-shutdown.html