On 08/08/2016 12:08 AM, Philippe Girolami wrote:
So you are VACUUMing the lesser 'younger' tables?
I VACUUM those with the highest age :
SELECT age,array_agg(table_name) FROM (SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') order by 2 desc limit 1000) tt group by age order by age desc
• I exit singleuser mode and relaunch the server so I keep on vacuuming the oldest table first by copying
• But I don’t even have time to launch my script : as soon as launch the server, the auto-vacuum daemon kicks in and burns through the transactions : I guess it’s not smart enough to start with
the oldest tables ?
Not understanding; 'the auto-vacuum daemon kicks in and burns through
the transactions'.
Are you saying it is reclaiming xids for you or using them?
If reclaiming that is what is supposed to do and is good thing.
Or am I misunderstanding?
Here is what the logs show when I do what I described above
1) I got 7 transactions back in single user mode
Aug 7 23:40:57 p2 postgres[30376]: [5-1] 2016-08-07 23:40:57 CEST WARNING: database "public" must be vacuumed within 999893 transactions
So the above is from when you enter single user mode?
Aug 7 23:40:57 p2 postgres[30376]: [5-2] 2016-08-07 23:40:57 CEST HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
I am not seeing what you do in single user mode?
2) I exit single user mode and restart the database
Aug 7 23:41:40 p2 postgres[15457]: [1-1] 2016-08-07 23:41:40 CEST LOG: database system was shut down at 2016-08-07 23:41:32 CEST
Aug 7 23:41:40 p2 postgres[15458]: [1-1] 2016-08-07 23:41:40 CEST LOG: incomplete startup packet
Aug 7 23:41:40 p2 postgres[15459]: [1-1] 2016-08-07 23:41:40 CEST FATAL: the database system is starting up
Aug 7 23:41:40 p2 postgres[15457]: [2-1] 2016-08-07 23:41:40 CEST WARNING: database with OID 16385 must be vacuumed within 999892 transactions
So you actually lost a transaction.
Aug 7 23:41:40 p2 postgres[15457]: [2-2] 2016-08-07 23:41:40 CEST HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
Aug 7 23:41:40 p2 postgres[15457]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug 7 23:41:40 p2 postgres[15462]: [1-1] 2016-08-07 23:41:40 CEST LOG: autovacuum launcher started
Aug 7 23:41:40 p2 postgres[15447]: [1-1] 2016-08-07 23:41:40 CEST LOG: database system is ready to accept connections
3) but I don’t even have time to run a query to see the state of the databases
Aug 7 23:41:52 p2 postgres[15487]: [2-1] 2016-08-07 23:41:52 CEST ERROR: database is not accepting commands to avoid wraparound data loss in database "public"
Aug 7 23:41:52 p2 postgres[15487]: [2-2] 2016-08-07 23:41:52 CEST HINT: Stop the postmaster and use a standalone backend to vacuum that database.
Aug 7 23:41:52 p2 postgres[15487]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug 7 23:41:52 p2 postgres[15487]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT: SELECT datname, age(datfrozenxid) FROM pg_database
Aug 7 23:41:52 p2 postgres[15497]: [2-1] 2016-08-07 23:41:52 CEST ERROR: database is not accepting commands to avoid wraparound data loss in database "public"
Aug 7 23:41:52 p2 postgres[15497]: [2-2] 2016-08-07 23:41:52 CEST HINT: Stop the postmaster and use a standalone backend to vacuum that database.
Aug 7 23:41:52 p2 postgres[15497]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug 7 23:41:52 p2 postgres[15497]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT: COPY ( SELECT 'VACUUM VERBOSE ' ||c.oid::regclass|| ';--' as _command, greatest ( age ( c.relfrozenxid ) ,age ( t.relfrozenxid ) ) as age FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ( 'r' , 'm' ) order by 2 desc limit 1000 ) TO STDOUT (FORMAT text)
Aug 7 23:41:52 p2 postgres[15518]: [2-1] 2016-08-07 23:41:52 CEST ERROR: database is not accepting commands to avoid wraparound data loss in database "public"
Aug 7 23:41:52 p2 postgres[15518]: [2-2] 2016-08-07 23:41:52 CEST HINT: Stop the postmaster and use a standalone backend to vacuum that database.
Aug 7 23:41:52 p2 postgres[15518]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug 7 23:41:52 p2 postgres[15518]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT: SELECT datname, age(datfrozenxid) FROM pg_database
Aug 7 23:41:53 p2 rsyslogd-2177: imuxsock lost 3512 messages from pid 15476 due to rate-limiting
Hmm I wonder what pid 15476(see below) is trying to do that is blowing
out the logging?
Aug 7 23:41:53 p2 postgres[15476]: [2-1] 2016-08-07 23:41:53 CEST ERROR: database is not accepting commands to avoid wraparound data loss in database "public"
Aug 7 23:41:53 p2 postgres[15476]: [2-2] 2016-08-07 23:41:53 CEST HINT: Stop the postmaster and use a standalone backend to vacuum that database.
Aug 7 23:41:53 p2 postgres[15476]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug 7 23:41:53 p2 postgres[15476]: [2-4] 2016-08-07 23:41:53 CEST CONTEXT: automatic analyze of table "public.public.aaaaaa"
4) lots of autovacuum failures
It is trying to analyze tables and being refused as are all commands.
Aug 7 23:41:55 p2 postgres[15476]: [3-1] 2016-08-07 23:41:55 CEST ERROR: database is not accepting commands to avoid wraparound data loss in database "public"
Aug 7 23:41:55 p2 postgres[15476]: [3-2] 2016-08-07 23:41:55 CEST HINT: Stop the postmaster and use a standalone backend to vacuum that database.
Aug 7 23:41:55 p2 postgres[15476]: [3-3] #011You might also need to commit or roll back old prepared transactions.
Aug 7 23:41:55 p2 postgres[15476]: [3-4] 2016-08-07 23:41:55 CEST CONTEXT: automatic analyze of table "public.public.aaaaab"
Aug 7 23:41:56 p2 postgres[15476]: [4-1] 2016-08-07 23:41:56 CEST ERROR: database is not accepting commands to avoid wraparound data loss in database "public"
Aug 7 23:41:56 p2 postgres[15476]: [4-2] 2016-08-07 23:41:56 CEST HINT: Stop the postmaster and use a standalone backend to vacuum that database.
Aug 7 23:41:56 p2 postgres[15476]: [4-3] #011You might also need to commit or roll back old prepared transactions.
Aug 7 23:41:56 p2 postgres[15476]: [4-4] 2016-08-07 23:41:56 CEST CONTEXT: automatic analyze of table "public.public.aaaaac"
Aug 7 23:41:56 p2 postgres[15476]: [5-1] 2016-08-07 23:41:56 CEST ERROR: database is not accepting commands to avoid wraparound data loss in database "public"
Aug 7 23:41:56 p2 postgres[15476]: [5-2] 2016-08-07 23:41:56 CEST HINT: Stop the postmaster and use a standalone backend to vacuum that database.
Aug 7 23:41:56 p2 postgres[15476]: [5-3] #011You might also need to commit or roll back old prepared transactions.
(…)
So it looks like I’m going to have to bite the bullet and really vacuum the whole database instead of just the oldest tables first which will impact our production pipelines.
Or VACUUM the most heavily used tables in the database.
Why would that help ? Aren’t the tables with the highest age the problem ?
I think we are saying the same thing. The tables that have accumulated
the most xid debt.
The next question to be asked is; what is creating the transactions and
is the transaction rate 'normal' or is there a possibility you have a
rogue process or rogue processes in action?
That’s always a possibility but I can’t think of what that would be.
I was guessing that the auto-vacuum wasn’t vacuuming the tables with the highest age first. Isn’t that a possibility ?
Looks to me like it never gets a chance in normal mode to get its work
done. As mentioned before you need to create some headroom to work with.
The logical place to do that would be in single user mode where other
sessions cannot interfere.
Thanks
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general