Re: Data corruption zero a file - help!!

Michael Fuhr <mike@xxxxxxxx> · Wed, 8 Mar 2006 23:15:46 -0700

On Thu, Mar 09, 2006 at 03:57:46PM +1100, Noel Faux wrote:
> Given that this seems problem has occurred a number of times for a 
> number I've written a small step by step procedure to address this 
> issue.  Is there any other comments you which to add.  I was thinking 
> that this should be added to the FAQ / troubleshooting in the docs.
> 
> How to repair corrupted data due to "ERROR: invalid page header in block 
> X of relation "Y":

The word "repair" might be misleading.  The operation repairs the
table in a sense, but as the following caution points out it does
so by completing the destruction that something else began.

> CAUTION this will permanently remove the data defined in the bad block
> 
>   1. To identify which file(s) the relation is in:
>         1. <THE SELECT STATEMENTS TO COLLECT THIS DATA> I can't 
> remember how I did it, but will keep hunting for my notes :)

Hint: pg_class.relfilenode

http://www.postgresql.org/docs/8.1/interactive/catalog-pg-class.html

>         2. so the file(s) are $PGDATA/<databaseDIR>/<relFile>.* The * 
> is any number which is defined below

You might want to show how to determine <databaseDIR>.  The actual
location might not be under $PGDATA -- 8.0 and later have tablespaces
and earlier versions support alternative locations, so instructions
should account for that.  Also, relations smaller than 1G won't have
any .N files.

http://www.postgresql.org/docs/8.1/interactive/manage-ag-tablespaces.html
http://www.postgresql.org/docs/8.1/interactive/storage.html
http://www.postgresql.org/docs/7.4/interactive/manage-ag-alternate-locs.html

>   2. To calculate the * value:
>         1. SELECT <block> / 131072 AS filenum, <block> % 131072 AS blocknum;
>            filenum    |  blocknum
>         -----------+----------------
>          <filenum> | <blocknum>
>               1. 131072 comes from "each database file is 1G, or 131072 
> * 8k blocks"

The 1G figure obviously applies only to tables that require that
much space.  If <filenum> comes back zero then you'd use the file
without any .N suffix.  If the bad block is less than 131072 (or
however many other-than-8k blocks fit in 1G) then you needn't bother
with the calculation.

>   3. Now you need to re-zero this block using the following command:
>         1. dd bs=8k seek=<blocknum> conv=notrunc count=1 if=/dev/zero 
> of=$PGDATA/base/<databaseDIR>/<relFile>.<filenum>

I'd recommend testing the command on a throwaway file before working
with real data -- "measure twice, cut once" as it were.  To gain
confidence in what you're doing you could create a test table,
populate it with data, corrupt its data file, then zero its bad
blocks until you can select all of the remaining data.  Playing
around in a production database is probably a bad idea; a safer way
would be to initdb a test cluster and run a separate postmaster
(listening on a different port if you're on the same machine as the
real database).

It's probably best to shut down the postmaster while you're mucking
around with the data files.

>               1. Before you do this it is best to backup the block:  
> "dd bs=8k skip=<blocknum> count=1 if=/path/file | hd"

This command doesn't back up the block, it pipes the block into a
command that on some systems will display a hex and ASCII dump of
the data (some systems will require a command other than hd).  You
could back up the block by redirecting the dd output to a file
instead of piping it into another command.

Incidentally, I was looking at your web site and your project
might make an interesting case study for the PostgreSQL web site
(Community -> In The Real World -> Case studies).

http://www.postgresql.org/about/casestudies/

Some users and potential users might be interested in reading about
how you're using PostgreSQL with a 100G+ database.  Post a message
to pgsql-www if you'd be interested in providing a write-up.

-- 
Michael Fuhr