db corruption/recovery help

"Ed L." <pgsql@xxxxxxxxxxxxx> · Mon, 6 Jun 2005 14:16:32 -0600

Someone flipped a breaker switch, and evidently triggered
corruption in one of our major clusters:

$ cat server_log.Mon  
postmaster successfully started
2005-06-06 14:31:11.950 [20124]  LOG:  database system was interrupted being in recovery at 2005-06-06 14:29:01 EDT
        This probably means that some data blocks are corrupted
        and you will have to use the last backup for recovery.
2005-06-06 14:31:11.950 [20124]  LOG:  checkpoint record is at EF/EBB7AFC8
2005-06-06 14:31:11.950 [20124]  LOG:  redo record is at EF/EBA91EF0; undo record is at 0/0; shutdown FALSE
2005-06-06 14:31:11.950 [20124]  LOG:  next transaction id: 577477594; next oid: 89750885
2005-06-06 14:31:11.951 [20124]  LOG:  database system was not properly shut down; automatic recovery in progress
2005-06-06 14:31:11.952 [20124]  LOG:  redo starts at EF/EBA91EF0
2005-06-06 14:31:11.984 [20124]  PANIC:  Invalid page header in block 22376 of 79189398
2005-06-06 14:31:12.275 [20121]  LOG:  startup process (pid 20124) was terminated by signal 6
2005-06-06 14:31:12.275 [20121]  LOG:  aborting startup due to startup process failure

We have backups from 10 hours earlier, but the obvious
question:  what, if anything, can I do now to salvage those
10 hours of data from this?

I guess I could zero the block?  I'm a little uncertain since
I don't know what I'm zeroing (pg_database? pg_class?), and 
can't start up to see what that relfilenode maps to...

Going to look at it with pg_filedump, maybe oid2filename or
whatever that utility is...

Thanks,
Ed

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly