Re: monitoring warm standby lag in 8.4?

Josh Kupershmidt <schmiddy@xxxxxxxxx> · Fri, 10 Dec 2010 14:13:06 -0500

On Fri, Dec 10, 2010 at 11:27 AM, Greg Sabino Mullane <greg@xxxxxxxxxxxx> wrote:
> Correct. But since we cannot connect to a database in recovery mode,
> there are very few options to determine how far 'behind' it is. The
> pg_controldata is what the check_postgres program uses. This offers a
> rough check which is usually sufficient unless you have a very
> inactive database or need very fine grained checking.
>
> A better system would perhaps connect to both ends and examine which
> specific WALs were being shipped and which one was last played, but
> there are no tools I know of that do that. I suspect the reason for
> this is that the pg_controldata check is "good enough". Certainly,
> that's what we are using for many clients via check_postgres, and
> it's been very good at detecting when the replica has problems. Good
> enough that I've never worried about writing a different method,
> anyway. :)

Thanks for the reply.

One simple piece I added in to my monitoring script which wasn't here:
  http://www.kennygorman.com/wordpress/?p=249
(or in check_postgres.pl, from a quick look at check_checkpoint() in
check_postgres.pl) is a verification that the standby slave is
actually 'in archive recovery' mode, from looking at the 'Database
cluster state:' output of pg_controldata.

I was mulling over some ways to add in a reasonable check that the
standby was keeping up with the WAL stream. Comparing WAL file names
on master vs. standby would probably work, but I was also thinking
that a simple directory-size check on the standby's WAL archive
directory would show whether we were receiving WAL files faster than
we could process them.

Josh

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general