Re: block error, but can't pg_dump

NOW Web Sites Manager <webperson@xxxxxxx> · Mon, 16 Jun 2008 22:39:58 -0700

Thanks for your reply.  I'm afraid my Linux/Postgresql
isn't quite good enough to use your answer.

1) Yes, I was selecting the records by the primary key.

2) Much of the data in this table can be dumped.  It's helpful for
reports but it would be fine to archive it or even zero it out if it
would get this working again.  I'd love to save recent data somehow,
though, and the pg_dump only dumps the oldest.

3) I'm using Red Hat Linux, but when I tried to install the rhdb it
failed on make although I downloaded version 4.0.  I am using
Postgresql 8.0.13 -- I know I should upgrade but I needed an older
version for older software I was using and then even now I shouldn't
upgrade until I can backup everything.

4) I don't quite know how to get the OID of my database (directory name)
and the relfilenode of the table.  If I try and connect to
pg_database (psql -d pg_database) when signed in as the postgresql user,
I get:

psql: FATAL:  database "pg_database" does not exist

I've looked online but can't find a page to describe what to do. 

I've thought about using delete * where... from the database to delete
the older records in case that fixes it, but select * where... for older
records doesn't work, so I assume delete won't either?  I don't want
to try this if it doesn't even get me anywhere.

Thanks so much for your help.

At 03:44 PM 6/13/2008 -0400, Tom Lane wrote:

NOW Web Sites Manager
<webperson@xxxxxxx> writes:

> I have a table that reports one block error when I try to select

> everything, reindex, vacuum, etc. (invalid page header in block

> 413620 of relation "tablename")  I've read on the
list that the thing 

> to do is use pg_dump and then restore it.  However, pg_dump
fails 

> with the same error.

> I can't take down the whole server, let alone easily Postgresql, to

> debug this.  Any ideas as to how I could do this?  I wrote
a Perl 

> script to select the records one by one but it didn't find a bad

> record although maybe it's how I wrote the script (it just selected

> the records one by one.)

Hm, were you selecting the records by primary key?  (or some other
way

that would have produced an index search?)  If so, the above
suggests

that the corrupted block is one that happens not to contain any live

data, which would be a lucky break for you.

Personally I'd want to know what was going on before taking any
drastic

measures, so I'd try to dump out the block with pg_filedump:

http://sources.redhat.com/rhdb/

However, if you're comfortable that your perl script was able to

retrieve everything you care about from the table, you could skip

that and go directly to zeroing out the block, which is the
appropriate

recovery action when you don't care about the data in it.  The
usual

way to do that is with dd:

        dd
if=/dev/zero of=TARGETFILE bs=8k seek=413620 count=1

(Sorry, no idea how to do it if you're on Windows.)

The trick is to be sure you've got the right target file.  You need
the

OID of your database (directory name) and the relfilenode (NOT OID)

of the table, which you get from pg_database and pg_class
respectively.

Then the file is $PGDATA/base/DBOID/RELFILENODE.

Hm, wait a second, block 413620 would be past the 1-GB mark, well

past it in fact.  What you're actually gonna need is

        dd
if=/dev/zero of=TARGETFILE.3 bs=8k seek=20404 count=1

ie the filename you want is RELFILENODE.3 not just RELFILENODE.

You might want to read the "Database Physical Storage" chapter
of

TFM to be sure you follow all this before you start zapping
data.

        regards, tom
lane