Re: Data corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2021-07-21 at 18:14 +0200, Konrad Garus wrote:
> we have a database that apparently got corrupted in a crash a few months ago, long enough that there is no "good" backup.
> 
> There are at least two tables occasionally emitting errors like this:
> 
> ERROR: invalid page header in block 39684 of relation base/12926/32397 while performing SQL query: SQL DML: insert into foo (a, b, c) values (?, ?, ?)
>  | PARAMETERS: foo, bar, baz: org.postgresql.util.PSQLException: ERROR: invalid page header in block 39684 of relation base/12926/32397
> 
> The files in question (e.g. 32397) correspond to the table itself, not an index.
>  The larger of these tables has two files (32397 & 32397.1), almost 2G total.
> 
> To make things worse, vacuum ends with:
> 
> ERROR:  could not access status of transaction 2828785257
> DETAIL:  Could not open file "pg_clog/0A89": No such file or directory.
> 
> That file is nowhere to be found.
> 
> When I do this:
> 
> set zero_damaged_pages = on;
> select count(*) from foo
> 
> It shows 3 errors like this:
> 
> WARNING:  invalid page header in block 78550 of relation base/12926/31642; zeroing out page
> WARNING:  invalid page header in block 78551 of relation base/12926/31642; zeroing out page
> WARNING:  invalid page header in block 78552 of relation base/12926/31642; zeroing out page
> 
> One idea I found is to generate a zero file for the missing clog, set zero_damaged_pages=on,
>  then vacuum or recreate the table. Looks like it might work, but I'm trying to understand the implications.
> 
> 1. When I generate a zero clog file like this, what impact will it have on everything?
>    Can it break anything else in the database, introduce some kind of inconsistency with other (correctly committed) table data?

Certainly.  I would create a fake clog file with only 0x55 in it, that corresponds to
all transactions being committed, so you will "see" more of the damaged data.

But of yourse there is no guarantee that the data are consistent.  For example,
foreign keys could be violated.  You will have to manually fix these data
before you can create the constraints.

> 2. How can I gauge how much actual data (e.g. how many rows) will be lost by doing this (zero clog file + vacuum/recreate table)?

You'd have to look at the binary data of the affected blocks to get an idea.
The "pageinspect" contrib can extract a block from a data file.

> 3. Is there some place I can look to reliably tell how many rows the table should have,
>    so I can compare to how many I can read after setting zero_damaged_pages?

No.

> 4. Is there any way at all to recover/repair the corrupted data at this point?

With the techiques you describe, you can salvage some data.  Carefully reading the damaged
blocks, you may glean some more.

But typically, at that point you are happy if you lose only some data.  You may have to
delete some more to make the data consistent.  Then cut your losses and load the dump
into a new cluster on good hardware.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com






[Index of Archives]     [Postgresql Home]     [Postgresql General]     [Postgresql Performance]     [Postgresql PHP]     [Postgresql Jobs]     [PHP Users]     [PHP Databases]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Yosemite Forum]

  Powered by Linux