Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

Cédric Villemain <cedric.villemain.debian@xxxxxxxxx> · Thu, 22 Apr 2010 18:02:00 +0200

2010/4/22 Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx>:
> Στις Thursday 22 April 2010 16:53:05 ο/η Cédric Villemain έγραψε:
>> 2010/4/22 Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx>:
>> > Hello,
>> > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem)
>> > I think that this server was under some sort of constant resets or hardware failures.
>> > Initially,i had this problem:
>> > ERROR:  invalid page header in block 672720 of relation "pg_toast_125716009"
>> >
>> > This toast table corresponds to a table named "mail_message",
>> >                                Table "public.mail_message"
>> >  Column   |       Type        |                         Modifiers
>> > -----------+-------------------+-----------------------------------------------------------
>> >  msgno     | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass)
>> >  msgsource | bytea             |
>> > Indexes:
>> >    "mail_message_key" PRIMARY KEY, btree (msgno)
>> >
>> > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance.
>> > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this
>> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html
>> >
>> > i found the oid of the table:
>> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1
>> >  tableoid  | ?column?
>> > -----------+----------
>> >  125716013 |        1
>> >
>> > (and just to verify)
>> > SELECT relname from pg_class where oid=125716013;
>> >      relname
>> > --------------------
>> >  pg_toast_125716009
>> >
>> > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
>> > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1
>>
>> segment have 1.1GB size maximum. You have to catch in what segment the
>> faulty block is, and reajust the block value from the error report to
>> the real one in the good segment.
>>
>
> Thanx,
> Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/md.c
> i see the the error comes from function mdnblocks
>
> if (nblocks > ((BlockNumber) RELSEG_SIZE))
>                        elog(FATAL, "segment too big");
>
> That means, that some segment file is bigger than RELSEG_SIZE
> At least in my system:
> #define BLCKSZ   8192
> #define RELSEG_SIZE (0x40000000 / BLCKSZ)
> So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 bytes = 1GB)
>
> Currently i dont have any access to the machine but tomorrow i will check the file sizes.
>
> Can anyone shed some light as to some method of identifying all the segment files of a table?
> The first one has the same name as the tableoid.
> How about the subsequent segments?

Your execution of dd make your first segment bigger than expected.
Other segment have the same name with a .1 .2 etc  suffix.

You have to shrink your first segment to the correct size.
check what happens, you should have now the original error.

And, I have never used it, but I think it is the purpose of
zero_damaged_pages to parameter to allow postgresql itself to zero the
bad black. (reading
src/backend/storage/buffer/bufmgr.c confirm that.

*BUT* take care that it will zero *every* bad page, perhaps not only
the one trapping an error.

In those situation, it is good to make a snapshot of the pgdata
directory, in case your fingers surf  too fast on the keyboard....

If you don't want to activate zero_damage_page,  then go and calculate
which block in which segment you have to zeroing.

side note, it may be usefull to have the relevant information in the
error message...

>
>> >
>> > However, after that, unfortunately i get constant postgresql server restarts with:
>> > FATAL:  segment too big
>> > server closed the connection unexpectedly
>> >        This probably means the server terminated abnormally
>> >        before or while processing the request.
>> > The connection to the server was lost. Attempting reset: Succeeded.
>> >
>> > Is there anything i can do to savage the situation?
>> >
>> > (one of) the hard part here is that i dont have neither physical nor network access to the server
>> > (only ultra expensive unreliable satellite comms)
>> >
>> > Thanks for any hints...
>> >
>> > --
>> > Achilleas Mantzios
>> >
>> > --
>> > Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-admin
>> >
>>
>>
>>
>
>
>
> --
> Achilleas Mantzios
>

-- 
Cédric Villemain

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin