Re: files under pg_clog directories are missing

Dinesh Bhandary <dbhandary@xxxxxxx> · Mon, 20 Sep 2010 09:03:02 -0700

 Good reminder to backup before you start. Yes, I've made the backup of 
the db in its current state.

After reading your posts and doing a slew of research I am inclining 
more towards restoring from the last successful dump and reload data 
from applications (it seems like it is going to be partial recovery).  I 
am concerned that even if we are able to clear all those messages by 
whatever means the state of the database will be corrupted. Luckily this 
happens to be ready only db and we can live with it.

As for the nature of the corruption I still do know know what kind of 
hardware problems led to this; it happened at one of our clients site 
and we are still waiting to find out what caused it.  One piece of info 
we got was postgres data directory turned into read only partition.

Thanks everyone.

Dinesh

ps@ the ids of the missing clog_files are out of range.

On 9/17/2010 8:32 PM, Greg Smith wrote:
Dinesh Bhandary wrote:
Due to hardware crash we ran into issues where some blocks were 
corrupted and some files were missing.
I was able to get over the corrupted blocks ( errmsg - "invalid page 
header in block 12345  of realtion x") by setting
zero_damaged_pages=0 and running vacuum afterwards. Now I am running 
into situations where pg_clog files are missing (errmsg - "could not 
open pg_clog/0D0D). I have a backup which is quite old ( considering 
this as a last resort). Is there any other way to fix this problem?

I also created empty blocks to fool postgres, but there are so many 
of these file missing I was wondering if there a better/faster way to 
fix this problem.

I hope you made a whole backup of the files before you started trying 
to fix the damage too.  It's possible to try and fix this using tricks 
like zero_damaged_pages and dummy clog files, only to make things 
worse.  We do data recovery services here, and I've had to roll back 
to the original copy of the data multiple times before in order to try 
different things before getting a get good copy of someone's data back 
again.  If you don't have a copy of the database yet, do that before 
you do any more experimenting with the clog files.

I wrote a summary of links to past work like this you may find useful, 
and a little program to create missing pg_clog files that all say "the 
transaction you're asking about committed", available at:  
http://archives.postgresql.org/pgsql-general/2009-07/msg00985.php

You might further script that to speed up how fast you can fix these 
as they pop up, which makes the test/correct cycle time go down.  You 
might even write a script that loops over starting the database, looks 
at the end of the log file, and if it's yet another one of these 
missing just extract its number, recreate it, and start again.

Unfortunately, doing better than that is tricky.  We had to modify the 
PostgreSQL source code to automatically create them in order to handle 
this safely last time I ran into one of these that was badly corrupted 
and missing a whole lot of real clog files, not just ones that were 
unlikely to exist.  You should be staring at the numbers of each one 
of these as they're requested.  If the range is way outside of the 
active clog files you have, that's probably one you can create safely 
because it's garbage data anyway.  But if it starts asking for clog 
files that are in the middle or near the ends of the set you've got, 
you may have a bigger problem on your hands.

P.S. Make sure you dump a whole copy of the database the minute you 
get it started again and reload that before you start using it.  You 
have no idea what state all of the tables are really in after a crash 
like this without such an exercise.

--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin