Re: Finding out the aging tables for vacuuming to prevent data loss on transaction id wraparound

Fabio Pardi <f.pardi@xxxxxxxxxxxx> · Tue, 6 Mar 2018 14:26:58 +0100

>From my experience is good practice to:

- Monitor the age of your records (we use icinga/nagios for that) to know on time when things are going wrong

- Schedule a periodic VACUUM of your db, if possible. Eg: if your db is heavily used during the day but resting at night, then during the night is a good idea to run a VACUUM. This will also offload your db during the day, because less autovacuum jobs will be triggered.

As additional note, if I remember correctly, a vacuum will be triggered to prevent wraparound also if you set 'autovacuum=off' in your configuration. That specific vacuum will have priority over normal vacuums.

For statistical purposes, a good formula to understand your actual 'burn rate' is to periodically run: 

SELECT * FROM txid_current();

and compare results over time to have a good estimate on how many xid you are using daily.

Regards,

Fabio

On 03/06/2018 12:54 PM, Subramanian Krishnan wrote:
> Hello Laurenz,
> 
> Thank you for taking the time out to read through and problem description and responding back.
> 
> After posting the question on the forum, I continued investigating possible root causes.  Through a flash of intuition I suspected if it could be a table which is strictly not a part of the database (not listed in pg_class) which could be contributing factor.
> 
> pg_shdepend fitted the bill. So I did a vacuum on pg_catalog.pg_shdepend and post that re-ran the DB age query. The trick worked and the age came down to match the max age entry in pg_class. To be on the safer side, I did vacuuming of pg_shdescription and pg_shseclabel as well.
> 
> Your suggestion of:
> 1) Making AV more aggressive and
> 2) Tiding over the present problem of aged TOAST tables though manual/scripted VACUUM is valid and we are working on that currently.
> 
> Thanks and Regards,
> Subu
> 
> On Mon, Mar 5, 2018 at 2:54 PM, Laurenz Albe <laurenz.albe@xxxxxxxxxxx <mailto:laurenz.albe@xxxxxxxxxxx>> wrote:
> 
>     Subramanian Krishnan wrote:
>     > We are running PostgreSQL 9.4.14 and recently we received an alert for transaction id
>     > crossing the 1 billion mark. Since we can hit the transaction id wraparound issue for aging tables,
>     > we investigated by running the following queries:
>     >
>     > mpsdb=> select * from txid_current();
>     >  txid_current
>     > --------------
>     >    1100826671
>     > (1 row)
>     >
>     > mpsdb=> SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC LIMIT 20;
>     >   datname  |    age
>     > -----------+------------
>     >  mpsdb     | 1087909867
>     >  rdsadmin  | 1000562517
>     >  postgres  |  950697143
>     >  template0 |  900709601
>     >  template1 |  900697643
>     > (5 rows)
>     >
>     > mpsdb=> SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age, c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20;
>     >     table_name     | type |    age     | relfrozenxid
>     > -------------------+------+------------+--------------
>     >  pg_toast_22985917 | t    | 1042877986 |     57941433
>     >  pg_toast_22986301 | t    | 1042877877 |     57941542
>     >  pg_toast_23823161 | t    | 1041791836 |     59027583
>     >  pg_toast_23823242 | t    | 1041791776 |     59027643
>     >  pg_toast_23904594 | t    | 1041658236 |     59161183
>     >  pg_toast_23904629 | t    | 1041658221 |     59161198
>     >  pg_toast_23904640 | t    | 1041658208 |     59161211
>     >  pg_toast_23904648 | t    | 1041658200 |     59161219
>     >  pg_toast_23904658 | t    | 1041658191 |     59161228
>     >  pg_toast_23904666 | t    | 1041658183 |     59161236
>     >  pg_toast_23904674 | t    | 1041658175 |     59161244
>     >  pg_toast_23904682 | t    | 1041658169 |     59161250
>     >  pg_toast_23904690 | t    | 1041658161 |     59161258
>     >  pg_toast_23904698 | t    | 1041658156 |     59161263
>     >  pg_toast_23904706 | t    | 1041658149 |     59161270
>     >  pg_toast_23904714 | t    | 1041658129 |     59161290
>     >  pg_toast_23904722 | t    | 1041658096 |     59161323
>     >  pg_toast_23904730 | t    | 1041658090 |     59161329
>     >  pg_toast_23904738 | t    | 1041658085 |     59161334
>     >  pg_toast_23904746 | t    | 1041658077 |     59161342
>     > (20 rows)
>     >
>     >
>     > mpsdb=> SELECT datname, usename, pid, waiting, current_timestamp - xact_start AS xact_runtime, query FROM pg_stat_activity WHERE upper(query) like '%VACUUM%' ORDER BY xact_start;
>     > datname | usename  |  pid  | waiting |  xact_runtime   |                                 query
>     > ---------+----------+-------+---------+-----------------+------------------------------------------------------------------------
>     >  mpsdb   | rdsadmin | 13833 | f       | 00:00:14.161262 | autovacuum: VACUUM pg_toast.pg_toast_242649675 (to prevent wraparound)
>     >  mpsdb   | rdsadmin | 14565 | f       | 00:00:11.735571 | autovacuum: VACUUM pg_toast.pg_toast_242649757 (to prevent wraparound)
>     >  mpsdb   | rdsadmin | 15315 | f       | 00:00:09.303762 | autovacuum: VACUUM pg_toast.pg_toast_242649819 (to prevent wraparound)
>     >  mpsdb   | rdsadmin | 14935 | f       | 00:00:06.893078 | autovacuum: VACUUM pg_toast.pg_toast_243226657 (to prevent wraparound)
>     >  mpsdb   | rdsadmin | 15851 | f       | 00:00:04.322474 | autovacuum: VACUUM pg_toast.pg_toast_243227582 (to prevent wraparound)
>     >  mpsdb   | rdsadmin | 15615 | f       | 00:00:01.768495 | autovacuum: VACUUM pg_toast.pg_toast_243226332 (to prevent wraparound)
>     >
>     > It is clear that we need to vacuum the most aged pg_toast tables which are not yet picked up by the autovacuum process.
>     >
>     > The question which is bothering us though is:
>     > As per the queries we ran the most aged transaction and table is 1042877986 (pg_toast_22985917).
>     > And as per PostgreSQL pg_database documentation the most aged table/transaction in pg_class determines
>     > age of the database (via the datfrozenxid value). If that is the case we would have expected the age
>     > of mpsdb database to be 1042877986 (or in this range) but what we see as age is 1087909867.
>     >
>     > Does this mean there are other aging tables we are not aware of? If yes, how do we get to those
>     > tables since we have already queried pg_class for the most aged table?
>     >
>     > Request any help/guidance we can get regarding this.
> 
>     That is indeed strange.
> 
>     Reading the code, I see that at the end of a VACUUM, "datfrozenxid" is set
>     to the minimum value of all "relfrozenxid" for all tables in the database
>     obtained with a sequential scan of "pg_class".
> 
>     The only exception is that if a table is found where "relfrozenxid" is in the
>     future, nothing is done.  The comment suggests that that should normally not
>     happen, but "has been known to arise due to bugs in pg_upgrade".
> 
>     - Can you find tables with negative "relfrozenxid" age in "pg_class"?
> 
>     - The other option is that a VACUUM finished between the query of "pg_database"
>       and the query of "pg_class".  Is that an option?
> 
>     You should tune autovacuum to be more aggressive so that it gets done processing
>     the tables.  To get rid of the immediate problem, you could schedule a manual
>     VACUUM of the tables to which the TOAST table belong.
> 
>     Yours,
>     Laurenz Albe
> 
>