>>> On Fri, Oct 26, 2007 at 6:39 PM, in message <1193441976.7624.128.camel@xxxxxxxxxxxxxxxxxxx>, Jeff Davis <pgsql@xxxxxxxxxxx> wrote: > On Fri, 2007-10-26 at 18:06 -0500, Kevin Grittner wrote: >> Hmmm... We would actually prefer to get the WAL file at the >> specified interval. We have software to ensure that the warm >> standby instances are not getting stale, and that's pretty simple >> with the current behavior. > > Another thought: when you say it's "pretty simple", what do you do now? > My monitoring scripts for this particular situation employ some pretty > ugly code. Here's our script: #! /bin/bash if [ "$1" == "" ] ; then savepwd=$PWD cd /var/pgsql/data/county/ find * -maxdepth 0 -type d | xargs -idirname $0 dirname cd $savepwd exit 0 fi for countyName ; do echo County: $countyName /usr/local/pgsql/bin/pg_controldata /var/pgsql/data/county/$countyName/data | grep -E '(Database cluster state|pg_control last modified)' /etc/init.d/postgresql-${countyName}-cc status grep basebackup /var/pgsql/data/county/$countyName/data/basebackup-of-this-instance echo '' done Here's an example of running it (although the opcenter usually runs it without a parameter, to get all counties): opcenter@PGBACKUP:~> sudo pgstatus.sh iowa County: iowa Database cluster state: in archive recovery pg_control last modified: Mon 29 Oct 2007 09:03:16 AM CDT pg_ctl: server is running (PID: 15902) /usr/local/pgsql-8.2.4/bin/postgres -D /var/pgsql/data/county/iowa/data basebackup cc-2007-10-26_190001 This gets parsed by a script in our monitor (python, I think) and winds up feeding a status display. It's probably a bit crude, but it has worked well for us, with very little time required to get it going. This thread has made me aware that it is dependent on the checkpoint frequency as well as the archive frequency. Our checkpoint_timeout setting is 30min and our archive_timeout is the default (one hour). The monitor shows red if the cluster state isn't "in archvie recovery" or pg_ctl doesn't report "server is running" or the last modified is older than 75 minutes. We are OK with a one hour archive interval because we have a separate application-oriented transaction stream (independent of the database product) coming back real-time, which we can replay to "top off" a database backup. -Kevin ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend