Re: ERROR invalid page header in block xxx of relation base/xxxxx/xxxxx/

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Looping pgsql-general mail list


I see I am not clear in my question , below are the order of events we see when  we get a  invalid page header in block corruption

-Windows server crashed/restarted due to power failure ( we believe)  ( I see that write-cache / write back cache / Disk cache are enabled and we don't have battery backup )
-Posgres DB crashes (LOG:  database system was not properly shut down; automatic recovery in progress )
- After around 3hrs after crash we see below messages in log

2016-03-03 20:13:18 GMT ERROR:  invalid page header in block 204 of relation base/16413/16900
2016-03-03 20:13:18 GMT CONTEXT:  automatic analyze of table "a.parametertable" => this is a relation not Index

- select count(*) on table gives invalid page header message , where as  select count(*) order by primary key does not give any issue
- After clearing the blocks using zero_damaged_pages , total count of table rows  using the table and order by primary key matches which means there is no dataloss ( I havent verified each record by record , I assume there is no dataloss) please correct ??

I would like to know what was the cause of invalid page header and is there any way I can reproduce this error ?

Appreciate your comments and suggestions on this ?

-Sreekanth

On Thu, Dec 8, 2016 at 12:55 PM, Sreekanth Palluru <sree4pg@xxxxxxxxx> wrote:

Forgot to loop community 


Hi Jorge/All,
Thanks for the reply.
As per the documentation , I think backend reads the this page header and reports that it is damaged.  
I am looking at ways we re-create this scenario of creation this blank corrupted page ? do I have any control over Backend and after it initializes a new page using function PageInit(Page page, Size pageSize, Size specialSize) and I want to halt/crash the backend by stopping postgres database or through Hardware/system crash ?
Is this possible ?

Also I see that based on below comments from Source code, having such pages in database is normal . is my understanding correct ?


/*
63  * PageIsVerified
64  * Check that the page header and checksum (if any) appear valid.
65  *
66  * This is called when a page has just been read in from disk. The idea is
67  * to cheaply detect trashed pages before we go nuts following bogus item
68  * pointers, testing invalid transaction identifiers, etc.
69  *
70  * It turns out to be necessary to allow zeroed pages here too. Even though
71  * this routine is *not* called when deliberately adding a page to a relation,
72  * there are scenarios in which a zeroed page might be found in a table.
73  * (Example: a backend extends a relation, then crashes before it can write
74  * any WAL entry about the new page. The kernel will already have the
75  * zeroed page in the file, and it will stay that way after restart.) So we
76  * allow zeroed pages here, and are careful that the page access macros
77  * treat such a page as empty and without free space. Eventually, VACUUM
78  * will clean up such a page and make it usable.
79  */
zero_damaged_pages (boolean)

Detection of a damaged page header normally causes PostgreSQL to report an error, aborting the current transaction. Setting zero_damaged_pages to on causes the system to instead report a warning, zero out the damaged page in memory, and continue processing. This behavior will destroy data, namely all the rows on the damaged page. However, it does allow you to get past the error and retrieve rows from any undamaged pages that might be present in the table. It is useful for recovering data if corruption has occurred due to a hardware or software error. You should generally not set this on until you have given up hope of recovering data from the damaged pages of a table. Zeroed-out pages are not forced to disk so it is recommended to recreate the table or the index before turning this parameter off again. The default setting is off, and it can only be changed by a superuser.


On Thu, Dec 8, 2016 at 9:50 AM, Jorge Torralba <jorge.torralba@xxxxxxxxx> wrote:
Look into this setting.

zero_damaged_pages  = on;

The docs should explain it.



On Wed, Dec 7, 2016 at 2:47 PM, sreekanth Palluru <sree4pg@xxxxxxxxx> wrote:
Hi ,

I am getting invalid page header error and what I could observe is when I select the table I get this error , where as if I select table and order by primary key I can retrieve the rows from table.

And I don't see any dataloss ( based on total number records) after fixing the blocks using zero_damaged_pages=on and then vacuum full on the table


Please note I have renamed few tables to avoid giving actual table names

We run Postgres 9.2 version on Windows and Dell Optiplex XE2 server and we suspect that block corruption would have happen due to hardware/memory/power failures reasons and I have gone through wikik https://wiki.postgresql.org/wiki/Reliable_Writes.

I want to understand why we can read the table through where there exists index and explain plan shows Index scan and with high cost compare to seq scan .
I assume that since there no rows/data present in these corrupted blocks index scan skips these blocks and hence it is not throwing the error . 

Also , I want to know what would have caused the postgres to create these corrupted blocks and 
can I reproduce this error ? appreciate if you share any pointers to blogs/mailing lists if this type of issue is already discussed ?


create table a.parametertable_bak as select * from a.parametertable order by id;

labs=# select count(*) from a.parametertable_bak ;
 count
-------
 31415
(1 row)


labs=#
labs=# checkpoint;
CHECKPOINT
labs=# set zero_damaged_pages=on;
SET
labs=# vacuum full a.parametertable;
WARNING:  invalid page header in block 204 of relation base/16413/16900; zeroing out page
WARNING:  invalid page header in block 205 of relation base/16413/16900; zeroing out page
VACUUM
labs=# select count(*) from a.parametertable ;
 count
-------
 31415
(1 row)


labs=#
-Sreekanth



--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.



--
Regards
Sreekanth



--
Regards
Sreekanth



--
Regards
Sreekanth

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux