Re: four template0 databases after vacuum

Kazuaki Fujikura <fujya@xxxxxxxxx> · Thu, 11 Feb 2016 00:19:37 +0900

Thank you for your comments. 

>Can you explain what your replication set up is? 

Streaming Replication.
=>
master----slave1 (async)
master----slave2 (async)

>So are you doing the below on the master, the slaves or all? 
=>master

Our current plan at this moment is
- Upgrade 9.1.19 from 9.1.6 at the end of March at our scheduled maintenance
- Export and import pg_dump files ( to eliminate the effect of template0 and xid, which are all reset at import)

We are still afraid that it is too late (as we still have 4 strange template0 files). So, if you have any ideas/suggestions which we can try before scheduled maintenance, that is much appreciated.

Best regards,
Kazuaki Fujikura

2016-02-09 1:51 GMT+09:00 Adrian Klaver <adrian.klaver@xxxxxxxxxxx>:
On 02/08/2016 04:16 AM, Kazuaki Fujikura wrote:

Thank you for your comments.

First, I think I need to tell you our database situation

- 3 physical databases (installed in different servers. 1master, 2 slave

servers.)

For future reference the above are generally called database clusters or instances to distinguish them from the databases created inside them, what you call logical databases. Thanks for explaining it helps clear up some confusion on my part.

Can you explain what your replication set up is?

- more than logical 1100 databases in each servers

[Karsten and Melvin]

It shows 0 records in template0 with the query you provided.

===============================

  schema | table | index | idx_scan | idx_tup_read | idx_tup_fetch |

type | pg_get_indexdef | statusi | size_in_bytes | size

--------+-------+-------+----------+--------------+---------------+------+-----------------+---------+---------------+------

(0 rosw)

===============================

So are you doing the below on the master, the slaves or all?

[Adrian]

 >> [problem history/background]

 >>

 >> Jan 10th, 2016:

 >> The first problem was autovacuum issue.

 >> - autovacuum could not finish successfully.

 >> - I set autovacuum_freeze_max_age to 2 hundreds million.

 >> - autovacuum immediately finished against the database which age was

 >> over 2 hundreds million.

 >

 >Which was?

Any logical database (we have more than 1100 databases) which age

(relfrozenxid) is more than 2 hundreds

million shows that autovacuum runs repeatedly (it starts and stops

autovacuum process repeatedly with no

vacuum processing).

 >> - so, autovacuum did not go next database

 >

 >Which was?

- I saw autovacuum stops at template0 because it can not run vacuum

freeze against it

- I thought it was because the age of template0 exceeds the config

parameter of autovacuum kick, which is 2 hundreds milliions

- So, I wanted to reduce the age of template0 (I don't know why it

increases though)

 >Not finish on what?

I could finish vacuum manually.

But, autovacuum was not finished.

 >> This time, the age did not reduce with manual vacuum.

 >Age of what?

age(relfrozenxid) of template0.

 >This I do not get, how the xid count on pg_database so quickly and to

 >such an extent that it needs a VACUUM FULL?

 >Is there a script that is creating and dropping databases rapidly?

We have more than 1100 databases and create new database every day

whenever new customer comes.

Number of transactions are more than ten millions in total of 1100+

database.

 >What are you trying to do with the above?

 >I do not think it is a coincidence that the first time the above was run

 >in this sequence, shortly after 4 template0 databases appear.

I run vacuum full because I could not change the value of relfrozenxid

of pg_database with vacuum/vacuum freeze.

Except template0 database, I can change relfrozenxid if I run vacuum

full pg_database.

 >Can you show?:

 >

 >SELECT * from pg_database;

 >

 >If you do not want to show the whole cluster, then at least the

 >databases involved in this discussion.

===============================

postgres=# SELECT oid,ctid,* from pg_database where datname =  'template0' ;

   oid  |  ctid   |  datname  | datdba | encoding | datcollate |

datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid |

datfrozenxid | dattablespace |

        datacl

-------+---------+-----------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------

-----------------------------

  12772 | (36,25) | template0 |     10 |        6 | C          | C

  | t             | f            |           -1 |         12772 |

2412920847 |          1663 | {=c/pos

tgres,postgres=CTc/postgres}

  12772 | (36,26) | template0 |     10 |        6 | C          | C

  | t             | f            |           -1 |         12772 |

2264969019 |          1663 | {=c/pos

tgres,postgres=CTc/postgres}

  12772 | (36,27) | template0 |     10 |        6 | C          | C

  | t             | f            |           -1 |         12772 |

2264969019 |          1663 | {=c/pos

tgres,postgres=CTc/postgres}

  12772 | (36,28) | template0 |     10 |        6 | C          | C

  | t             | f            |           -1 |         12772 |

2264969019 |          1663 | {=c/pos

tgres,postgres=CTc/postgres}

(4 rows)

===============================

oid is same value.

But ctid is different values.

The rest of records has our customer name. If you need more info from

here, I can send you the whole data.

No at this point I do not think that is necessary.

Best regards,

Kazuaki Fujikura

-- 

Adrian Klaver

adrian.klaver@xxxxxxxxxxx