I have in my cluster 3 nodes (1 master version 9.6.3+ 2 slaves version 9.6.3). I configured repmgr (with repmgrd active) v 4.0.4.
Suddenly today after a few good weeks I noticed that there is a lag in one of the slaves and the error in the log indicated that the slave didnt get the wal :
could not receive data from WAL stream: ERROR: requested WAL segment 0000000900002E61000000BD has already been removed
However, when I check if the wal was recieveed :
postgres=# select pg_is_in_recovery(),pg_is_xlog_replay_paused(),pg_last_xlog_receive_location(),pg_last_xlog_replay_location();
pg_is_in_recovery | pg_is_xlog_replay_paused | pg_last_xlog_receive_location | pg_last_xlog_replay_location
t | f | 2E61/BDF5C000 | 2E61/BDF5B930
(1 row)
and I checked in pg_xlog directory :
ls -l ../pg_xlog/0000000900002E61000000BD
-rw------- 1 postgres postgres 16777216 Jul 11 11:13 ../pg_xlog/0000000900002E61000000BD
and the xlog is exist.
Now is my question, why the wal wasnt replayed ?
In my repmgr.conf I dont have any parameters regarding recovery just some basic things. The recovery.conf file in the data directory :
standby_mode = 'on'
primary_conninfo = 'host=xxxxxxx user=repmgr application_name=''psgsqldb2'' connect_timeout=2'
recovery_target_timeline = 'latest'
any idea ?