> They run with fsync off AND they like to toggle the power switch at > random? I'd suggest finding other employment --- they couldn't > possibly be paying you enough to justify cleaning up after stupidity > as gross as that. Colo-by-windows. If there weren't DBAs with Win32 admin tendencies, I'd be out of work. :) > Anyway, the errors appear to indicate that there are pages in the > database with LSN (last WAL location) larger than the actual current > end of WAL. The difference is pretty large though --- at least 85MB > of WAL seems to have gone missing. My first thought was a corrupted > LSN field. But seeing that there are at least two such pages, and > given the antics you describe above, what seems more likely is that > the LSNs were correct when written. I think some page of WAL never > made it to disk during a period of heavy updates that was terminated > by a power cycle, and during replay we stopped at the first point > where the WAL data was detectably corrupt, and so a big chunk of WAL > never got replayed. Which of course means there's probably a lot of > stuff that needs to be fixed and did not get fixed, but in > particular our idea of the current end-of-WAL address is a lot less > than it should be. If you have the server log from just after the > last postmaster restart, looking at what terminated the replay might > confirm this. Peachy. > You could get the DB to stop complaining by doing a pg_resetxlog to > push the WAL start address above the largest "flush request" > mentioned in any of the messages. But my guess is that you'll find > a lot of internal corruption after you do it. Going back to the > dump might be a saner way to proceed. Tons of corruption and a backup that's a few weeks old. *grin* The most recent dump seems to have all of the data, but some rows are there in duplicate. Thanks for the input. -sc -- Sean Chittenden