Search Postgresql Archives

Re: Do I have a corrupted database?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Craig Ringer wrote:
William Garrison wrote:
I fear I have a corrupted database, and I'm not sure what to do.

First, make sure you have a recent backup. If your backups rotate, stop
the rotation so that all currently available historical copies of the
database are preserved from now on - just in case you need them.

Since I made my post, we found that we can't do a pg_dump. :( Every time this error appears in the logs, postgres forcably closes any connections (including any running instances of pgadmin or pg_dump) when it runs this little recovery process. We have backups from some days ago plus transaction logs. We also have a snapshot of the file system, and I'm hoping to find a way to attach that onto another system. I've had trouble with that in the past. As for the SAN and the Windows event log: Our IT guy says the SAN reported no failures at the time. I don't know much about the SAN itself, I just know it uses dual fiber-channels and all the drives are in some RAID array. I think it also is hardened against nuclear strikes and has a built-in laser defense system. At the time of the problem, the Windows event log indicates no problems writing to the drives, or any other failures of any kind really. No other apps crashed, no unusual memory usage, plenty of disk space. So the cause is a complete mystery. :( So for now, I'm focused on repair.

We tried to REINDEX each table, and we are getting duplicate key errors so the reindex fails. I can fix those records manually, but I was hoping to dump the database, find the duplicates using another system, then delete/repair the bad records and restore onto the production machine. But since the backup/restore isn't working, that isn't looking like a viable option.

Are there any kind of repair tools for a postgres database? Any sort of routine where I can take it offline and run like pg_fsck --all and it will come back with a report or a repair procedure?
Now, if possible dump your database with pg_dump. Restore the dump to a
test database instance and make sure that it all goes OK.

Once that's done, so you know you have a decent recovery point to work
from in case you make a mistake during your recovery efforts.

After that I don't have all that much to offer, especially as you're
using an operating system I don't have much experience with Pg on and
you're using an (unspecified) SAN.

Normally I'd ask if you'd verified your RAID array / tested your disks.
In this case, I'm wondering if there's any chance there was a service
interruption on the SAN that might've caused some sort of intermittent
or partial writes.

2008-08-23 20:00:27 ERROR:  xlog flush request E0/293CF278 is not
satisfied --- flushed only to E0/21B1B7F0
2008-08-23 20:00:27 CONTEXT:  writing block 94218 of relation
16712/16713/16725
2008-08-23 20:04:36 DETAIL:  Multiple failures --- write error may be
permanent.

Yeah, I'm really wondering about the SAN and SAN connection. What sort
of SAN is it? How is the host connected? Does it have any sort of
logging and monitoring that might let you see if there was a problem
around the time Pg was complaining?

Have you checked the Windows error logs?

--
Craig Ringer




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux