I'm running 7.4 on RHAS 4, and I think I've had a transaction id
wraparound issue in a stats database we have. Running the command below gives the suitably
worrying negative number:
[dbname]=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
------------------+-----------
[maindbname] | -2081610471
[otherdbname] | 1075601025
[otherdbname] | 1257289757
[otherdbname] | 1074582099
[otherdbname] | 1257289757
Which is weird - because I have vacuumed the database quite a lot -
both individual tables and I thought a vacuum of the whole database a
month or so ago.
Anyway - not noticed any data loss yet and was hoping it would be such
that if all tables had been vacuumed recently (including system catalog
tables), that there would be no remaining rows that would appear to
have a future xid and so the database should be ok?
Obviously I'm now doing the write thing with a vacuumdb -a - however
this has been running 9 hours now and looks like at least 7 hours to
go just on this one monstrous table
in the interests of risk reduction I've just knocked up a script to run
ahead and quickly vacuum all the other tables.
But my questions are thus...
a) is my assumption about the database being ok correct - assuming all
tables have been vacuumed recently, including catalog tables?
b) is it possible to safely abort my whole table vacuum now so I can
run it at the weekend when there's less traffic?
c) if I have experienced data loss, on the assumption all the table
structure remains (looks like it does), and I have a working backup
from before the xid wraparound (I do), can I just reinsert any
detected-missing data at the application level without needing a
dump/reload?
Any help appreciated in this really not-fun time,
thanks
S