Re: NOTIFY command impact

Rob Brucks <rob.brucks@xxxxxxxxxxxxx> · Thu, 23 Feb 2017 16:26:10 +0000

Hi Benoit,

Your SQL works fine as long as there are never any communication problems between the master and slave.  But if your slave loses communication with the master, then the SQL you provided
 will not report any lag despite the potential that the slave could be very far behind the master.

This is because that SQL is asking the slave how far it is behind the master based on logs it has received from the master. However if the slave has not received logs from the master it
 cannot tell if the master has simply gone idle or if it has lost communication and is unable to send updates.

By introducing a mechanism to manually force the replication stream to be sent periodically, which is what I use the NOTIFY command to do on the master, I can ensure that the slave is updating
 pg_last_xact_replay_timestamp at least as often as the NOTIFY is executed on the master (say every 30 seconds) and is successfully replayed on the slave.  This enables an accurate measurement of replication lag whatever the circumstances.

If the master has simply gone idle because there is no update activity, the NOTIFY command will still force the replication stream to replay on the slave every 30 seconds and the replay
 timestamp will be updated.

But if communication to the master is lost, for whatever reason, the replay timestamp will not be updated on the slave and then lag can be accurately measured and alerted.

I hope this helps!

--Rob

From: 
Benoit Lobréau <benoit.lobreau@xxxxxxxxx>

Date: Thursday, February 23, 2017 at 9:23 AM

To: "David G. Johnston" <david.g.johnston@xxxxxxxxx>

Cc: Rob Brucks <rob.brucks@xxxxxxxxxxxxx>, Tom Lane <tgl@xxxxxxxxxxxxx>, François Beaulieu <frank@xxxxxxxxx>, Adrian Klaver <adrian.klaver@xxxxxxxxxxx>, "pgsql-general@xxxxxxxxxxxxxx" <pgsql-general@xxxxxxxxxxxxxx>

Subject: Re: [GENERAL] NOTIFY command impact

Hi, 

I might have missed something. 

This should be enough to solve the problem no ?

SELECT

CASE

WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0

 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER

END

AS replication_lag;

Benoit.