Hi Benoit, Your SQL works fine as long as there are never any communication problems between the master and slave. But if your slave loses communication with the master, then the SQL you provided
will not report any lag despite the potential that the slave could be very far behind the master. This is because that SQL is asking the slave how far it is behind the master based on logs it has received from the master. However if the slave has not received logs from the master it
cannot tell if the master has simply gone idle or if it has lost communication and is unable to send updates. By introducing a mechanism to manually force the replication stream to be sent periodically, which is what I use the NOTIFY command to do on the master, I can ensure that the slave is updating
pg_last_xact_replay_timestamp at least as often as the NOTIFY is executed on the master (say every 30 seconds) and is successfully replayed on the slave. This enables an accurate measurement of replication lag whatever the circumstances. If the master has simply gone idle because there is no update activity, the NOTIFY command will still force the replication stream to replay on the slave every 30 seconds and the replay
timestamp will be updated. But if communication to the master is lost, for whatever reason, the replay timestamp will not be updated on the slave and then lag can be accurately measured and alerted. I hope this helps! --Rob From:
Benoit Lobréau <benoit.lobreau@xxxxxxxxx> Hi, I might have missed something. This should be enough to solve the problem no ? SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER END AS replication_lag; Benoit. |