Re: Safe way to check for PostgreSQL replication delay/lag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/15/2014 12:38 PM, Strahinja Kustudić wrote:
Looking at the documentation and all the blog posts about how to monitor replication delay I don't think there is one good and most importantly safe solution which works all the time.

I have a basic homegrown script, the core of which is below, that runs this query on my servers (running 9.1 - settings may need tweaking for other versions). It is a bash script that runs every minute via cron and sets $standby_delay to a value in seconds. The script sends varying levels of alert depending on the severity of the delay. (We must be sized appropriately since we essentially never get alerts unless we have restarted a server for maintenance.)

The script uses a one-record/one-column table called sync_status to hold a timestamptz. To keep things standard across machines, it determines whether it is a master or replica based on the value of 'transaction_read_only' in pg_settings and either updates sync_status or reads it, along with other xact_replay information, to determine the worst-case delay.

The core of the script writes a query to a temporary file then executes that query. The output will be the worst-case measurement of delay seconds (or zero if the master) which is saved in $standby_delay. An empty value indicates a failure of the script to run. Obviously there is more to the script both in setting the value of creating and setting $tempquery, cleaning up temp files, and generating the appropriate alerts based on the severity of the delay.

Your alert methods and severity levels are up to you. Note that when run by cron every minute there exists a reasonable possibility that the slave can report a 60-second delay when actually caught up so you will need to account for that in setting your alert levels.

standby_delay=$(
psql -q --tuples-only --no-align 2>/dev/null <<EOS
\o ${tempquery}
select
case when setting='on' then
'
with logdelay as
    (
    select
    case when
pg_last_xlog_receive_location()=pg_last_xlog_replay_location() then 0::int
    else
(extract(epoch from now())-extract(epoch from pg_last_xact_replay_timestamp()))::int
    end as replicadelay
    union
    select
(extract(epoch from now())-extract(epoch from sync_time))::int as replicadelay
    from
        sync_status
    )
select
    max(replicadelay)
from
    logdelay
;
'
else
'
begin;
delete from sync_status;
insert into sync_status (sync_time) values (now()) returning 0::int as replicadelay;
commit;
'
end
from pg_settings where name='transaction_read_only';
\o
\i ${tempquery}
EOS
)


Cheers,
Steve


--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux