Re: wal exist in slave but getting err requested WAL segment has already been removed

Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> · Wed, 11 Jul 2018 16:44:24 +0300

Yes i can see its content. However in the end of its content I'm getting the next msg : 
pg_xlogdump: FATAL:  error in WAL record at 2E61/BDF59950: invalid magic number 0000 in log segment 0000000000002E61000000BD, offset 16105472
Maybe this is the reason behind it ?

2018-07-11 16:39 GMT+03:00 Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx>:

    On 11/07/2018 16:32, Mariel Cherkassky
      wrote:

        The wal is available on the standby, not on the
          primary. It is already in the pg_xlog directory of the
          slave...

    Ok but apparently this is not complete. Can you see its contents
    with pg_waldump (or pg_xlogdump) ?

    Do you have any backup mechanism in place? Any WAL shipping /
    archiving mechanism ?

          2018-07-11 16:26 GMT+03:00 Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx>:

                On
                  11/07/2018 16:09, Mariel Cherkassky wrote:

                    Hi,
                    I have in my cluster 3 nodes (1
                      master version 9.6.3+ 2 slaves version 9.6.3). I
                      configured repmgr (with repmgrd active) v 4.0.4.

                    Suddenly today after a few good weeks
                      I noticed that there is a lag in one of the slaves
                      and the error in the log indicated that the slave
                      didnt get the wal : 

                    could not receive data from WAL
                      stream: ERROR:  requested WAL segment
                      0000000900002E61000000BD has already been removed

                    However, when I check if the wal was
                      recieveed : 

                      postgres=# select
                        pg_is_in_recovery(),pg_is_xlog_replay_paused(),pg_last_xlog_receive_location(),pg_last_xlog_replay_location();
                       pg_is_in_recovery |
                        pg_is_xlog_replay_paused |
                        pg_last_xlog_receive_location |
                        pg_last_xlog_replay_location 
                      -------------------+--------------------------+-------------------------------+------------------------------
                       t                 | f             
                                  | 2E61/BDF5C000                 |
                        2E61/BDF5B930
                      (1 row)

                      and  I checked in pg_xlog directory : 

                        ls -l ../pg_xlog/0000000900002E61000000BD
                        -rw------- 1 postgres postgres 16777216 Jul
                          11 11:13 ../pg_xlog/0000000900002E61000000BD

                      and the xlog is exist.

               In which node did you check for the file?

              If the file in the primary is still available, try to
              compare their md5sum .

              If you have a working WAL shipping method in place, then
              add the appropriate line in the recovery.conf of your
              standby :

              restore_command = 'rsync somemachine:/somepath/pitr/%f "%p" '

                      Now is my question, why the wal wasnt
                        replayed ?
                      In my repmgr.conf I dont have any parameters
                        regarding recovery just some basic things.  The
                        recovery.conf file in the data directory : 

                        standby_mode = 'on'
                        primary_conninfo = 'host=xxxxxxx
                          user=repmgr application_name=''psgsqldb2''
                          connect_timeout=2'
                        recovery_target_timeline = 'latest'

                      any idea ? 

                  -- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

    -- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt