archive_command too slow.

Joao Junior <jcoj2006@xxxxxxxxx> · Wed, 2 Nov 2016 20:06:05 +0100

Hi friends,

I am running 2 Linux machines, kernel  3.13.0-45-generic #74-Ubuntu SMP.
Postgresql version 9.4 in both machine, in a Hot Standby cenario.

Master-Slave using WAL files, not streaming replication.

The archive_command from master is:

archive_command = '/usr/bin/rsync -a -e "ssh" "%p" slave:/data2/postgres/standby/main/incoming/"%f"' #

The recovery.conf from slave is:
standby_mode = 'on'
restore_command = 'cp /data2/postgres/standby/main/incoming/%f "%p"'

We have a have intensive write operation generating for example 1577 wals segments per hour ~= 26 segments per minute.

The slave is very behind from master, more than 20 hours.
I can see that all WAL segments on master are on ready state, waiting for archive_command do his jobs. 

The slave is waiting for the wal files as described above.

016-11-02 18:57:48 UTC::@:[15698]: LOG:  unexpected pageaddr C955/C5000000 in log segment 000000010000C96000000023, offset 0
2016-11-02 18:57:54 UTC::@:[15698]: LOG:  restored log file "000000010000C96000000022" from archive
2016-11-02 18:57:54 UTC::@:[15698]: LOG:  restored log file "000000010000C96000000023" from archive
2016-11-02 18:57:54 UTC::@:[15698]: LOG:  restored log file "000000010000C96000000024" from archive
cp: cannot stat ‘/data2/postgres/standby/main/incoming/000000010000C96000000025’: No such file or directory
2016-11-02 18:57:54 UTC::@:[15698]: LOG:  unexpected pageaddr C956/71000000 in log segment 000000010000C96000000025, offset 0
2016-11-02 18:57:58 UTC::@:[15698]: LOG:  restored log file "000000010000C96000000024" from archive
cp: cannot stat ‘/data2/postgres/standby/main/incoming/000000010000C96000000025’: No such file or directory

It seems that archive_command is very slowly compared with the amount of WAL segments generated.
Any suggestions??? Should I use another strategy to increase the archive_command process speed???

Best Regards,