Re: [Skytools-users] WAL Shipping + checkpoint

Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> · Thu, 27 Aug 2009 10:18:01 +1200

Sébastien Lardière wrote:
On 26/08/2009 04:46, Mark Kirkwood wrote:
Sébastien Lardière wrote:
Hi All,

I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago, 
the master had to restart.

I use walmgr from Skytools, which works very well.

I have already restart the master without any problem, but today, 
the slave doesn't work like I want. The field "Time of latest 
checkpoint" from the pg_controldata on the slave keep the same 
values, but WAL File are processed correctly.

I try to restart the slave, but, after processed again all the WAL 
between "Time of latest checkpoint" and, it does nothing else, 
latest checkpoint stay at the same value.

I don't know if it's important ( i think so ), and I can't fix it.

It is normal for it to lag behind somewhat on the slave (depending on 
what your checkpoint timeout etc settings are).

However, I've noticed what you are seeing as well - particularly when 
there are no actual data changes coming through in the logs - the 
slave checkpoint time does not change even tho there have been 
checkpoints on the master (I may have a look in the code to see what 
the story really is...if I have time).

Yes, but the delay between the last checkpoint on the master and the 
slave is very high, now ( 100 000 sec ), because the last checkpoint 
on the slave was yesterday ( as far as pg_controldata is right )

Here a graph from our munin plugin : 
http://seb.ouvaton.org/tmp/bdd-pg_walmgr-week.png

The blue line represent an average between two WAL processed on the 
slave, and the green line, the delai between last checkpoint on the 
master and the slave.

Maybe it's not some good indicator, but the green line let me think 
there is problem.

Do you have archive_timeout set? If so, then what *could* be happening 
is this:

There are actually no "real" data changes being made on your master for 
some reason. So every time archive_timeout is reached a log full of no 
changes is shipped to your slave and applied - and no checkpoint times 
are changed for reasons I mentioned above.

A way to test the would be to do something that makes real data changes 
in the master. A good thing to try would be to:

- create a new database
- create tables and add some reasonable amount of data (e.g. initialized 
pgbench scale 100).

Then see if your checkpoint time gets updated a few minutes or so later.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general