Synchronous Replication & replay_location

Rob Emery <re-pgsql@xxxxxxxxxxxxxxx> · Tue, 8 Apr 2014 13:10:11 +0100

Hello,

We are currently testing Synchronous Streaming replication between two PG 9.1 boxes () in
a test environment, with the slave as hot_standby. 
We appear to have it working correctly (changes appear on both) etc

However the following query from (http://www.dansketcher.com/2013/01/27/monitoring-postgresql-streaming-replication/):
	SELECT
		client_addr,
		sent_offset - (replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag
	FROM (
		SELECT
			client_addr,
			('x' || lpad(split_part(sent_location,   '/', 1), 8, '0'))::bit(32)::bigint AS sent_xlog,
			('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint AS replay_xlog,
			('x' || lpad(split_part(sent_location,   '/', 2), 8, '0'))::bit(32)::bigint AS sent_offset,
			('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint AS replay_offset
		FROM pg_stat_replication
	) AS s;

the byte_lag is almost never 0 and usually 816 bytes.

Similiarly, the output from pg_stat_replication *very* rarely has write_location and replay_location the same.

 procpid | usesysid |   usename   | application_name | client_addr | client_hostname | client_port |        backend_start         |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
---------+----------+-------------+------------------+-------------+-----------------+-------------+------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
   17895 |  7141183 | replication | walreceiver      | 10.2.1.151  |                 |       58017 | 2014-04-04 12:41:16.36911+01 | streaming | 17/E6014228   | 17/E6014228    | 17/E6014228    | 17/E6013F08     |             1 | sync
(1 row)

I've been attempting to find documentation about the meaning of sent_location, write_location, flush_location and replay_location but have so far been unsuccessful; especially with regards to the implications of replay_location being < flush_location.

Is this "normal" for synchronous replication?

Many Thanks,
Rob