Re: [HACKERS] WAL archiving idle database

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Mon, 29 Oct 2007 09:56:37 -0500

>>> On Fri, Oct 26, 2007 at  6:39 PM, in message
<1193441976.7624.128.camel@xxxxxxxxxxxxxxxxxxx>, Jeff Davis <pgsql@xxxxxxxxxxx>
wrote: 
> On Fri, 2007-10-26 at 18:06 -0500, Kevin Grittner wrote:
>> Hmmm...  We would actually prefer to get the WAL file at the
>> specified interval.  We have software to ensure that the warm
>> standby instances are not getting stale, and that's pretty simple
>> with the current behavior.
> 
> Another thought: when you say it's "pretty simple", what do you do now?
> My monitoring scripts for this particular situation employ some pretty
> ugly code.

Here's our script:

#! /bin/bash

if [ "$1" == "" ] ; then
  savepwd=$PWD
  cd /var/pgsql/data/county/
  find * -maxdepth 0 -type d | xargs -idirname $0 dirname
  cd $savepwd
  exit 0
fi

for countyName ; do
  echo County: $countyName
  /usr/local/pgsql/bin/pg_controldata /var/pgsql/data/county/$countyName/data | grep -E '(Database cluster state|pg_control last modified)'
  /etc/init.d/postgresql-${countyName}-cc status
  grep basebackup /var/pgsql/data/county/$countyName/data/basebackup-of-this-instance
  echo ''
done

Here's an example of running it (although the opcenter usually runs
it without a parameter, to get all counties):

opcenter@PGBACKUP:~> sudo pgstatus.sh iowa
County: iowa
Database cluster state:               in archive recovery
pg_control last modified:             Mon 29 Oct 2007 09:03:16 AM CDT
pg_ctl: server is running (PID: 15902)
/usr/local/pgsql-8.2.4/bin/postgres -D /var/pgsql/data/county/iowa/data
basebackup    cc-2007-10-26_190001

This gets parsed by a script in our monitor (python, I think) and
winds up feeding a status display.  It's probably a bit crude, but
it has worked well for us, with very little time required to get it
going.  This thread has made me aware that it is dependent on the
checkpoint frequency as well as the archive frequency.  Our
checkpoint_timeout setting is 30min and our archive_timeout is the
default (one hour).  The monitor shows red if the cluster state
isn't "in archvie recovery" or pg_ctl doesn't report "server is
running" or the last modified is older than 75 minutes.

We are OK with a one hour archive interval because we have a
separate application-oriented transaction stream (independent of
the database product) coming back real-time, which we can replay
to "top off" a database backup.

-Kevin

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend