Re: Calculating Replication Lag - units

David Kerr <dmk@xxxxxxxxxxxxxx> · Tue, 26 Jun 2012 09:20:27 -0700

On Tue, Jun 26, 2012 at 09:13:44AM -0700, Steve Crawford wrote:
- On 06/26/2012 08:16 AM, David Kerr wrote:
- >On 06/26/2012 05:11 AM, Stuart Bishop wrote:
- >>On Tue, Jun 26, 2012 at 6:21 AM, David Kerr<dmk@xxxxxxxxxxxxxx>  wrote:
- >>>On Mon, Jun 25, 2012 at 02:17:22PM -0700, Steve Crawford wrote:
- >>>- On 06/25/2012 01:17 PM, David Kerr wrote:
- >>>->Howdy,
- >>>->
- >>>->When calculating Replication lag, I know that we have to compare the
- >>>->pg_current_xlog_location
- >>>->to pg_last_xlog_receive_location, etc. but what I'm trying to 
- >>>figure out
- >>>->is what are
- >>>->the units that I'm left with after the calculation.
- >>>->
- >>>->(i.e., does the xlog_location imply some time value?)
- >>>->
- >>>->Here's the output of the (slightly modified script)
- >>>->Master: 5003964876715
- >>>->Receive: 5003964876715
- >>>->Replay: 5003964765203
- >>>->
- >>>->receive.value 0
- >>>->apply.value 111512
- >>>->
- >>>->111512 isn't inherently useful to me on its own.
- >>>->
- >>>->Any tips?
- >>>->
- >>>- How about now()-pg_last_xact_replay_timestamp() (however this can 
- >>>be a
- >>>- large number if there have not been any recent transactions on the
- >>>- master). I suppose you could do something like:
- >>>-
- >>>- case when pg_last_xlog_receive_location() =
- >>>- pg_last_xlog_replay_location() then '0 seconds'::interval
- >>>- else now()-pg_last_xact_replay_timestamp() end as log_delay;
- >>>
- >>>i don't know for sure that 111512 is a time value.. that's kind of
- >>>what i'm wondering. If i knew that it was like miliseconds or something
- >>>that would be helpful.
- >>
- >>On the hot standby:
- >>
- >>     SELECT now()-pg_last_xact_replay_timestamp() AS lag;
- >>
- >>This gives you the lag time as a PostgreSQL interval.
- >>
- >>(It also might give you a value if you run it on a database that is
- >>not a hot standby if it started in recovery mode).
- >>
- >>It seems difficult or impossible to calculate this on the master.
- >>
- >>
- >
- >Ah, awesome. I don't need to calculate it on the master so that's 
- >perfect.
- >
- >Thanks!
- >
- >
- But beware. If your master server has any idle time you may be misled by 
- the simple calculation. I was running a pgbench test on a replicated 
- pair of machines. It finished this morning sometime so the lag delay 
- shows 02:31:11.651118, a value that might set off alarm bells. That's 
- why I used the case statement to force the interval to 0 if the replay 
- is up-to-date.
- 
- I think it is still worthwhile to check pg_stat_replication on the 
- master to make sure that it is still *sending* logs and perhaps 
- cross-checking the current log position on the master with the replayed 
- log location on the standby to see if they are reasonably close.
- 
- An additional verification check I've toyed with is to have a cron 
- script on the master update a one-row one-column table that holds a 
- timestamp and checking that timestamp on the standby(s) to double-check 
- that it is not too-far out-of-date. (This would also force regular data 
- delivery to the standby so that pg_last_xact_replay_timestamp() should 
- not lag far behind on an otherwise idle server.)
- 
- Cheers,
- Steve
- 

I see, thanks. I was completely missing that you were using different functions
than me and that was throwing me off. =)

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general