Re: Calculating Replication Lag - units

Steve Crawford <scrawford@xxxxxxxxxxxxxxxxxxxx> · Tue, 26 Jun 2012 09:13:44 -0700

On 06/26/2012 08:16 AM, David Kerr wrote:
On 06/26/2012 05:11 AM, Stuart Bishop wrote:
On Tue, Jun 26, 2012 at 6:21 AM, David Kerr<dmk@xxxxxxxxxxxxxx>  wrote:
On Mon, Jun 25, 2012 at 02:17:22PM -0700, Steve Crawford wrote:
- On 06/25/2012 01:17 PM, David Kerr wrote:
->Howdy,
->
->When calculating Replication lag, I know that we have to compare the
->pg_current_xlog_location
->to pg_last_xlog_receive_location, etc. but what I'm trying to 
figure out
->is what are
->the units that I'm left with after the calculation.
->
->(i.e., does the xlog_location imply some time value?)
->
->Here's the output of the (slightly modified script)
->Master: 5003964876715
->Receive: 5003964876715
->Replay: 5003964765203
->
->receive.value 0
->apply.value 111512
->
->111512 isn't inherently useful to me on its own.
->
->Any tips?
->
- How about now()-pg_last_xact_replay_timestamp() (however this can 
be a
- large number if there have not been any recent transactions on the
- master). I suppose you could do something like:
-
- case when pg_last_xlog_receive_location() =
- pg_last_xlog_replay_location() then '0 seconds'::interval
- else now()-pg_last_xact_replay_timestamp() end as log_delay;

i don't know for sure that 111512 is a time value.. that's kind of
what i'm wondering. If i knew that it was like miliseconds or something
that would be helpful.

On the hot standby:

     SELECT now()-pg_last_xact_replay_timestamp() AS lag;

This gives you the lag time as a PostgreSQL interval.

(It also might give you a value if you run it on a database that is
not a hot standby if it started in recovery mode).

It seems difficult or impossible to calculate this on the master.

Ah, awesome. I don't need to calculate it on the master so that's 
perfect.

Thanks!

But beware. If your master server has any idle time you may be misled by 
the simple calculation. I was running a pgbench test on a replicated 
pair of machines. It finished this morning sometime so the lag delay 
shows 02:31:11.651118, a value that might set off alarm bells. That's 
why I used the case statement to force the interval to 0 if the replay 
is up-to-date.

I think it is still worthwhile to check pg_stat_replication on the 
master to make sure that it is still *sending* logs and perhaps 
cross-checking the current log position on the master with the replayed 
log location on the standby to see if they are reasonably close.

An additional verification check I've toyed with is to have a cron 
script on the master update a one-row one-column table that holds a 
timestamp and checking that timestamp on the standby(s) to double-check 
that it is not too-far out-of-date. (This would also force regular data 
delivery to the standby so that pg_last_xact_replay_timestamp() should 
not lag far behind on an otherwise idle server.)

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general