Could not open file "pg_clog/...."

"Markus Wollny" <Markus.Wollny@xxxxxxxxxxx> · Tue, 12 May 2009 12:04:58 +0200

Hello!

Recently one of my PostgreSQL servers has started throwing error
messages like these:

ERROR:  could not access status of transaction 3489956864
DETAIL:  Could not open file "pg_clog/0D00": Datei oder Verzeichnis
nicht gefunden. (file not found)

The machine in question doesn't show any signs of a hardware defect,
we're running a RAID-10 over 10 disks for this partition on a 3Ware
hardware RAID controller with battery backup unit, the controller
doesn't show any defects at all. We're running PostgreSQL 8.3.5 on that
box, kernel is 2.6.18-6-amd64 of Debian Etch, the PostgreSQL binaries
were compiled from source on that machine.

I searched the lists and though I couldn't find an exact hint as to
what's causing this, I found a suggestion for a more or less hotfix
solution:
Create a file of the required size filled with zeroes and then put that
into the clog-directory, i.e.
dd bs=262144 count=1 if=/dev/zero of=/tmp/pg_clog_replacements/0002
chown postgres.daemon /tmp/pg_clog_replacements/0002
chmod 600 /tmp/pg_clog_replacements/0002
mv /tmp/pg_clog_replacements/0002 /var/lib/pgsql/data/pg_clog

I know that I'd be loosing some transactions, but in our use case this
is not critical. Anyway, this made the problem go away for a while but
now I'm getting those messages again - and indeed the clog-files in
question appear to be missing altogether. And what's worse, the
workaround no longer works properly but makes PostgreSQL crash:

magazine=# vacuum analyze pcaction.article;
PANIC:  corrupted item pointer: 5
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

And from the logfile:

<2009-05-12 11:38:09 CEST - 6606: [local]@magazine>PANIC:  corrupted
item pointer: 5
<2009-05-12 11:38:09 CEST - 6606: [local]@magazine>STATEMENT:  vacuum
analyze pcaction.article;
<2009-05-12 11:38:09 CEST - 29178: @>LOG:  server process (PID 6606) was
terminated by signal 6: Aborted
<2009-05-12 11:38:09 CEST - 29178: @>LOG:  terminating any other active
server processes
<2009-05-12 11:38:09 CEST - 6607:
192.168.222.134(57292)@magazine>WARNING:  terminating connection because
of crash of another server process
<2009-05-12 11:38:09 CEST - 6607:
192.168.222.134(57292)@magazine>DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted
shared memory.
<2009-05-12 11:38:09 CEST - 6569: 192.168.222.134(57214)@bluebox>HINT:
In a moment you should be able to reconnect to the database and repeat
your command.
[...]
<2009-05-12 11:38:09 CEST - 29178: @>LOG:  all server processes
terminated; reinitializing
<2009-05-12 11:38:09 CEST - 6619: @>LOG:  database system was
interrupted; last known up at 2009-05-12 11:37:51 CEST
<2009-05-12 11:38:09 CEST - 6619: @>LOG:  database system was not
properly shut down; automatic recovery in progress
<2009-05-12 11:38:09 CEST - 6619: @>LOG:  redo starts at 172/8B4EE118
<2009-05-12 11:38:09 CEST - 6619: @>LOG:  record with zero length at
172/8B6AD510
<2009-05-12 11:38:09 CEST - 6619: @>LOG:  redo done at 172/8B6AD4E0
<2009-05-12 11:38:09 CEST - 6619: @>LOG:  last completed transaction was
at log time 2009-05-12 11:38:09.550175+02
<2009-05-12 11:38:09 CEST - 6619: @>LOG:  checkpoint starting: shutdown
immediate
<2009-05-12 11:38:09 CEST - 6619: @>LOG:  checkpoint complete: wrote 351
buffers (1.1%); 0 transaction log file(s) added, 0 removed, 2 recycled;
write=0.008s, sync=0.000 s, total=0.009 s
<2009-05-12 11:38:09 CEST - 6622: @>LOG:  autovacuum launcher started
<2009-05-12 11:38:09 CEST - 29178: @>LOG:  database system is ready to
accept connections

Now what exactly is causing those missing clog files, what can I do to
prevent this and what can I do to recover my database cluster, as this
issue seems to prevent proper dumps at the moment?

Kind regards

   Markus

Jede Stimme zahlt, jetzt voten fur die besten Games: www.bamaward.de

Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard 
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general