Re: Where art thou pg_clog?

Casey Duncan <casey@xxxxxxxxxxx> · Thu, 15 Feb 2007 17:39:04 -0800

On Feb 15, 2007, at 5:21 PM, Alvaro Herrera wrote:

Casey Duncan wrote:

Interestingly I can manually vacuum that table in all of the
databases on this machine without provoking the error.

Except template0 I presume?  Is this autovacuum running in template0
perchance?  I note that 800 million transactions have passed  
since the
Xid in the error message was current.

Wouldn't you know it! A little farther back up in the log file:

2007-02-15 14:20:48.480 PST LOG:  autovacuum: processing database
"template0"
2007-02-15 14:20:48.480 PST DEBUG:  StartTransaction
2007-02-15 14:20:48.480 PST DEBUG:  name: unnamed; blockState:
DEFAULT; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1,
children: <>
2007-02-15 14:20:48.481 PST DEBUG:  autovacuum: VACUUM FREEZE whole
database
2007-02-15 14:20:48.481 PST DEBUG:  CommitTransaction
2007-02-15 14:20:48.481 PST DEBUG:  name: unnamed; blockState:
STARTED; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1,
children: <>

This is a bug we fixed in 8.1.7.  I suggest you update to the  
latest of
the 8.1 series, to get that fix among others.

ok, great.

To fix the problem, set pg_database.datallowconn=true for template0,
then connect to it and do a VACUUM FREEZE.  Then set  
datallowconn=false
again.

Do you mean to do this after upgrading to 8.1.8? If I try than in  
8.1.5, I get (unsurprisingly):

% psql -U postgres template0 -c "vacuum freeze"
ERROR:  could not access status of transaction 2565134864
DETAIL:  could not open file "pg_clog/098E": No such file or directory

I'm curious how template0 got stomped on. Certainly nothing's been
changing it. Of course it might just be some random bug so the fact
it landed on a file for template0 could be completely arbitrary.

The problem is that all databases are vacuumed every so many
transactions, to avoid Xid wraparound problems; even non connectable
databases.  The problem is that a bug in autovacuum caused that vacuum
operation to neglect using the FREEZE flag; this negligence makes it
leave non-permanent Xids in the tables, leading to the problem you're
seeing.

Ironically we were earlier bitten by the bug that autovacuum didn't  
do the cluster-wide vacuum until too late. Now we got bitten by the  
fact that did do the cluster-wide vacuum. Talk about damned-if-you-do- 
and-damned-if-you-don't! 8^)

ok, this is a much better sounding explanation than "random data  
corruption" ;^)

Thanks!

-Casey