On 10/3/2010 3:44 PM, Karl Denninger wrote: > On 10/3/2010 1:34 AM, Guillaume Lelarge wrote: >> Le 03/10/2010 07:07, Karl Denninger a écrit : >>> On 10/2/2010 11:40 PM, Rajesh Kumar Mallah wrote: >>>> I hope u checked point #11 >>>> http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use >>>> >>>> * *11.* You can calculate the replication lag by comparing the >>>> current WAL write location on the primary with the last WAL >>>> location received/replayed by the standby. They can be retrieved >>>> using /pg_current_xlog_location/ on the primary and the >>>> /pg_last_xlog_receive_location///pg_last_xlog_replay_location/ >>>> on the standby, respectively. >>>> >>>> $ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host) >>>> pg_current_xlog_location >>>> -------------------------- >>>> 0/2000000 >>>> (1 row) >>>> >>>> $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host) >>>> pg_last_xlog_receive_location >>>> ------------------------------- >>>> 0/2000000 >>>> (1 row) >>>> >>>> $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host) >>>> pg_last_xlog_replay_location >>>> ------------------------------ >>>> 0/2000000 >>>> (1 row) >>>> >>>> Regds >>>> Rajesh Kumar Mallah. >>> Yes, I did. >>> >>> Now how do I get an arithmetic difference between the two? There will >>> (usually) be a small difference between the master and slave on a busy >>> system - what I want to do is query both and if the difference in their >>> locations is greater than some defined size, start raising hell (e.g. >>> sending SMS to people, etc) >>> >>> I can SEE the difference, but I don't see a way to COMPUTE a difference, >>> and there does not appear to be a function that will accept the log file >>> location as an argument for conversion - the one documented for offsets >>> (which might otherwise work) does not work on the slave as I noted. >>> >>> With Slony there was a set of system tables that would tell me how many >>> unapplied changes were in the queue. From this I could determine health >>> - if the number was more than some reasonably-small amount, something >>> was broken and alarms were to be raised. >>> >>> I'm looking for a way to implement the same sort of functionality here. >>> >>> ticker=# select pg_last_xlog_replay_location(); >>> pg_last_xlog_replay_location >>> ------------------------------ >>> 37A/327D1888 >>> (1 row) >>> >>> ticker=# select pg_current_xlog_location(); >>> pg_current_xlog_location >>> -------------------------- >>> 37A/3280DCB8 >>> (1 row) >>> >>> How do I get an arithmetic difference between these two >>> programmatically, and will such always be monoatomically increasing >>> (that is, will they ever roll over, thereby giving me a potential >>> NEGATIVE difference?) >>> >> pgPool-II does that computation. You should check that in its source >> code. File pool_worker_child.c, function check_replication_time_lag(). >> It creates a long value from the LSN returned by these functions. Here >> is the computation: >> >> lsn = xlogid * 16 * 1024 * 1024 * 255 + xrecoff; >> >> In your example, xlogid is 37A and xrecoff is 327D1888 on the slave and >> 3280DCB8 on the master. The hexadecimal values were first converted to >> their unsigned decimal integer representation (same file, function >> text_to_lsn()). >> >> You should really get a look at this file. It's probably a better >> explanation than what I'm trying to do :) >> > Here's a little program to do it - change "MAX_OFFSET" and modify > (particularly the "notify" function) to suit and place a file called > "CHECK.cluster" in the following form in the home directory of whatever > this runs as (it will look in the home directory of the euid of the > process): > > master-name connection-string-to-connect-to-master > slave1-name slave-string-to-connect > slave2-name slave-string-to-connect > ...... > > It will emit a notification for each slave that is more than > "MAX_OFFSET" behind, or if any slave is not in recovery mode (bad; it > thinks it's a master!) or worse, if the MASTER is in recovery mode, or > if it can't talk to any of the declared nodes. > > Note that "CHECK.cluster" contains connection information so make sure > you run this as someone and with a home directory for that someone > appropriately permitted to prevent leakage of the connection credentials. > > Ain't gonna claim it's elegant, but it was something I was able to > quickly bang out in a few minutes and it works. Stuck in the CRON if > you have the cron's email going somewhere that will get noticed fast > (e.g. a Blackberry email, etc) you can run this on whatever interval you > need and it'll do the job. > > Postgresql's development team can consider this a contribution to the > codebase if someone wants to clean it up a bit and include it. You > could obviously pick up the config file as an argument rather than using > the running euid > of the process to grab it; I'm using it in an environment where the > latter works better for me - YMMV. > I've cleaned this up a bit more and it is now at http://www.denninger.net/check-replication.c Feel free to grab and use it if you think it would be helpful. -- Karl
begin:vcard fn:Karl Denninger n:Denninger;Karl email;internet:karl@xxxxxxxxxxxxx note:Read The Market Ticker at http://market-ticker.org x-mozilla-html:TRUE version:2.1 end:vcard
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general