Hello,
--
we have a database that apparently got corrupted in a crash a few months ago, long enough that there is no "good" backup.
There are at least two tables occasionally emitting errors like this:
ERROR: invalid page header in block 39684 of relation base/12926/32397 while performing SQL query: SQL DML: insert into foo (a, b, c) values (?, ?, ?) | PARAMETERS: foo, bar, baz: org.postgresql.util.PSQLException: ERROR: invalid page header in block 39684 of relation base/12926/32397
The files in question (e.g. 32397) correspond to the table itself, not an index. The larger of these tables has two files (32397 & 32397.1), almost 2G total.
To make things worse, vacuum ends with:
ERROR: could not access status of transaction 2828785257
DETAIL: Could not open file "pg_clog/0A89": No such file or directory.
DETAIL: Could not open file "pg_clog/0A89": No such file or directory.
That file is nowhere to be found.
When I do this:
set zero_damaged_pages = on;
select count(*) from foo
It shows 3 errors like this:
WARNING: invalid page header in block 78550 of relation base/12926/31642; zeroing out page
WARNING: invalid page header in block 78551 of relation base/12926/31642; zeroing out page
WARNING: invalid page header in block 78552 of relation base/12926/31642; zeroing out page
WARNING: invalid page header in block 78551 of relation base/12926/31642; zeroing out page
WARNING: invalid page header in block 78552 of relation base/12926/31642; zeroing out page
One idea I found is to generate a zero file for the missing clog, set zero_damaged_pages=on, then vacuum or recreate the table. Looks like it might work, but I'm trying to understand the implications.
1. When I generate a zero clog file like this, what impact will it have on everything? Can it break anything else in the database, introduce some kind of inconsistency with other (correctly committed) table data?
2. How can I gauge how much actual data (e.g. how many rows) will be lost by doing this (zero clog file + vacuum/recreate table)?
3. Is there some place I can look to reliably tell how many rows the table should have, so I can compare to how many I can read after setting zero_damaged_pages?
4. Is there any way at all to recover/repair the corrupted data at this point?
Konrad Garus