Re: Recovery from Transaction wraparound failure

Ganesh Kannan <ganesh.kannan@xxxxxxxxxxxxxxxxxxxx> · Wed, 8 Feb 2017 14:37:08 +0000

Update...

I let the vacuum db for the night and i see there are about 10 tables that were never vacuumed (reduced from 20 tables.)  

select t.relname, t.last_vacuum, t.last_autovacuum, c.relfrozenxid, age(c.relfrozenxid)
 from pg_stat_user_tables t, pg_class  c
where schemaname='weather_data'
and c.relkind='r' 
and t.relid=c.oid
and last_vacuum is null and last_autovacuum is null

order by last_vacuum desc, last_autovacuum desc;

Adding that there are more than 1000 tables (each 14 GB or so) that match age(relfrozenxid)  > 

I could again take the database into single-user mode and let it run, but not sure if vacuum would get to those 10 tables first or not. can someone shed some light on the order in which manual vacuum or auto vacuum would pick up tables for vacuum? 

Regarding running the postgres --single in background, I used disown to let it run through overnight, but I had to kill the process to open the DB for read-only sessions. When i checked the pg_class.last_vacuum field, I don't see any table was vacuumed though
 the process was running for 8 hours. Does killing the single user mode affect "vacuum" from registering the fact it did vacuum on tables?

Thanks again, and I hope that there is enough intrigue and information here to get someone willing to help me out.
Ganesh

From: Ganesh Kannan

Sent: Wednesday, February 8, 2017 1:48:10 AM

To: pgsql-admin@xxxxxxxxxxxxxx

Subject: Recovery from Transaction wraparound failure

Hi All,

One of the production databases that we own is having transactions wraparound issue. This is a large database with more than 2000 tables, 70TB in size. Most of the tables run between 14 to 20GB in size. This database is predominantly read-only work load,
 but gets millions of new rows each day in batches. There are about 20 tables that were never vacuumed manually or picked up by auto vacuum, and I suspect those may be the ones causing the issue. 

At first, I started seeing this error for all writes:

sqlerrm:database is not accepting commands to avoid wraparound data loss in database "db1" , sqlstate:54000

1) After reading the documentation, I thought I could try to vacuum those 20 tables that were never vacuumed individually ( with relfrozenid more than 2 billion) in single-user mode, but it is not working (error copied below). Is there a way to do vacuum
 of individual tables in single-user mode? I would rather not do vacuum of all tables ("vacuum" command) as that would probably take several days.

backend>  vacuum freeze schema.wd_p51_y2015;       

2017-02-07 23:25:15 EST [54045]: [8-1] user=,db=,app=,client= WARNING:  database "db1" must be vacuumed within 999999 transactions

2017-02-07 23:25:15 EST [54045]: [9-1] user=,db=,app=,client= HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.

You might also need to commit or roll back old prepared transactions.

2)  If "vacuum" ing the whole db is my only option, how can i execute single-user mode in background with vacuum running? could someone please share an example? I can only execute "postgres --single DBNAME" in interactive mode and could not figure out sending
 it to background without the server terminating on me. I only have remote access to this db server, and needs a way to kick off the vacuum in single-user mode in the background. 

3) Restoring from the backup is an option for us, but given the db size and time to build indexes, but hoping to hear opinions from an expert or two. 

Thanks for your help in advance,

Ganesh Kannan