Re: wal exist in slave but getting err requested WAL segment has already been removed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes i can see its content. However in the end of its content I'm getting the next msg : 
pg_xlogdump: FATAL:  error in WAL record at 2E61/BDF59950: invalid magic number 0000 in log segment 0000000000002E61000000BD, offset 16105472
Maybe this is the reason behind it ?

2018-07-11 16:39 GMT+03:00 Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx>:
On 11/07/2018 16:32, Mariel Cherkassky wrote:
The wal is available on the standby, not on the primary. It is already in the pg_xlog directory of the slave...
Ok but apparently this is not complete. Can you see its contents with pg_waldump (or pg_xlogdump) ?
Do you have any backup mechanism in place? Any WAL shipping / archiving mechanism ?


2018-07-11 16:26 GMT+03:00 Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx>:
On 11/07/2018 16:09, Mariel Cherkassky wrote:
Hi,
I have in my cluster 3 nodes (1 master version 9.6.3+ 2 slaves version 9.6.3). I configured repmgr (with repmgrd active) v 4.0.4.

Suddenly today after a few good weeks I noticed that there is a lag in one of the slaves and the error in the log indicated that the slave didnt get the wal : 

could not receive data from WAL stream: ERROR:  requested WAL segment 0000000900002E61000000BD has already been removed

However, when I check if the wal was recieveed : 
postgres=# select pg_is_in_recovery(),pg_is_xlog_replay_paused(),pg_last_xlog_receive_location(),pg_last_xlog_replay_location();
 pg_is_in_recovery | pg_is_xlog_replay_paused | pg_last_xlog_receive_location | pg_last_xlog_replay_location 
-------------------+--------------------------+-------------------------------+------------------------------
 t                 | f                        | 2E61/BDF5C000                 | 2E61/BDF5B930
(1 row)

and  I checked in pg_xlog directory : 
ls -l ../pg_xlog/0000000900002E61000000BD
-rw------- 1 postgres postgres 16777216 Jul 11 11:13 ../pg_xlog/0000000900002E61000000BD

and the xlog is exist.

In which node did you check for the file?
If the file in the primary is still available, try to compare their md5sum .
If you have a working WAL shipping method in place, then add the appropriate line in the recovery.conf of your standby :
restore_command = 'rsync somemachine:/somepath/pitr/%f "%p" '

Now is my question, why the wal wasnt replayed ?
In my repmgr.conf I dont have any parameters regarding recovery just some basic things.  The recovery.conf file in the data directory : 

standby_mode = 'on'
primary_conninfo = 'host=xxxxxxx user=repmgr application_name=''psgsqldb2'' connect_timeout=2'
recovery_target_timeline = 'latest'


any idea ? 


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux