Thank you for your comments.
>Can you explain what your replication set up is?
Streaming Replication.
=>
master----slave1 (async)
master----slave2 (async)
>So are you doing the below on the master, the slaves or all?
=>master
Our current plan at this moment is
- Upgrade 9.1.19 from 9.1.6 at the end of March at our scheduled maintenance
- Export and import pg_dump files ( to eliminate the effect of template0 and xid, which are all reset at import)
We are still afraid that it is too late (as we still have 4 strange template0 files). So, if you have any ideas/suggestions which we can try before scheduled maintenance, that is much appreciated.
Best regards,
Kazuaki Fujikura
2016-02-09 1:51 GMT+09:00 Adrian Klaver <adrian.klaver@xxxxxxxxxxx>:
On 02/08/2016 04:16 AM, Kazuaki Fujikura wrote:
Thank you for your comments.
First, I think I need to tell you our database situation
- 3 physical databases (installed in different servers. 1master, 2 slave
servers.)
For future reference the above are generally called database clusters or instances to distinguish them from the databases created inside them, what you call logical databases. Thanks for explaining it helps clear up some confusion on my part.
Can you explain what your replication set up is?
- more than logical 1100 databases in each servers
[Karsten and Melvin]
It shows 0 records in template0 with the query you provided.
===============================
schema | table | index | idx_scan | idx_tup_read | idx_tup_fetch |
type | pg_get_indexdef | statusi | size_in_bytes | size
--------+-------+-------+----------+--------------+---------------+------+-----------------+---------+---------------+------
(0 rosw)
===============================
So are you doing the below on the master, the slaves or all?No at this point I do not think that is necessary.
[Adrian]
>> [problem history/background]
>>
>> Jan 10th, 2016:
>> The first problem was autovacuum issue.
>> - autovacuum could not finish successfully.
>> - I set autovacuum_freeze_max_age to 2 hundreds million.
>> - autovacuum immediately finished against the database which age was
>> over 2 hundreds million.
>
>Which was?
Any logical database (we have more than 1100 databases) which age
(relfrozenxid) is more than 2 hundreds
million shows that autovacuum runs repeatedly (it starts and stops
autovacuum process repeatedly with no
vacuum processing).
>> - so, autovacuum did not go next database
>
>Which was?
- I saw autovacuum stops at template0 because it can not run vacuum
freeze against it
- I thought it was because the age of template0 exceeds the config
parameter of autovacuum kick, which is 2 hundreds milliions
- So, I wanted to reduce the age of template0 (I don't know why it
increases though)
>Not finish on what?
I could finish vacuum manually.
But, autovacuum was not finished.
>> This time, the age did not reduce with manual vacuum.
>Age of what?
age(relfrozenxid) of template0.
>This I do not get, how the xid count on pg_database so quickly and to
>such an extent that it needs a VACUUM FULL?
>Is there a script that is creating and dropping databases rapidly?
We have more than 1100 databases and create new database every day
whenever new customer comes.
Number of transactions are more than ten millions in total of 1100+
database.
>What are you trying to do with the above?
>I do not think it is a coincidence that the first time the above was run
>in this sequence, shortly after 4 template0 databases appear.
I run vacuum full because I could not change the value of relfrozenxid
of pg_database with vacuum/vacuum freeze.
Except template0 database, I can change relfrozenxid if I run vacuum
full pg_database.
>Can you show?:
>
>SELECT * from pg_database;
>
>If you do not want to show the whole cluster, then at least the
>databases involved in this discussion.
===============================
postgres=# SELECT oid,ctid,* from pg_database where datname = 'template0' ;
oid | ctid | datname | datdba | encoding | datcollate |
datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid |
datfrozenxid | dattablespace |
datacl
-------+---------+-----------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------
-----------------------------
12772 | (36,25) | template0 | 10 | 6 | C | C
| t | f | -1 | 12772 |
2412920847 | 1663 | {=c/pos
tgres,postgres=CTc/postgres}
12772 | (36,26) | template0 | 10 | 6 | C | C
| t | f | -1 | 12772 |
2264969019 | 1663 | {=c/pos
tgres,postgres=CTc/postgres}
12772 | (36,27) | template0 | 10 | 6 | C | C
| t | f | -1 | 12772 |
2264969019 | 1663 | {=c/pos
tgres,postgres=CTc/postgres}
12772 | (36,28) | template0 | 10 | 6 | C | C
| t | f | -1 | 12772 |
2264969019 | 1663 | {=c/pos
tgres,postgres=CTc/postgres}
(4 rows)
===============================
oid is same value.
But ctid is different values.
The rest of records has our customer name. If you need more info from
here, I can send you the whole data.