Thanks for the replay.....
And how the recovery_target_timeline = 'latest' should minimize the promotion time??
Thanks,
ShayC
Can you please explain the long failover according to the log:
for example:
[2015-12-12 20:35:10.769 IST] LOG: trigger file found: /home/postgres/databases/fabrix/trigger
[2015-12-12 20:35:10.769 IST] FATAL: terminating walreceiver process due to administrator command
scp: /home/postgres/archive/0000009400000002000000DC: No such file or directory
[2015-12-12 20:35:10.893 IST] LOG: record with zero length at 2/DC000168
[2015-12-12 20:35:10.893 IST] LOG: redo done at 2/DC000100
scp: /home/postgres/archive/0000009400000002000000DC: No such file or directory
scp: /home/postgres/archive/0000009300000002000000DC: No such file or directory
scp: /home/postgres/archive/0000009200000002000000DC: No such file or directory
.
.
.
scp: /home/postgres/archive/0000009100000002000000DC: No such file or directory
scp: /home/postgres/archive/0000009000000002000000DC: No such file or directory
scp: /home/postgres/archive/00000095.history: No such file or directory
[2015-12-12 20:35:11.801 IST] LOG: selected new timeline ID: 149
[2015-12-12 20:35:11.931 IST] LOG: restored log file "00000094.history" from archive
[2015-12-12 20:35:12.173 IST] LOG: archive recovery complete
[2015-12-12 20:35:12.181 IST] LOG: database system is ready to accept connections
[2015-12-12 20:35:12.181 IST] LOG: autovacuum launcher started
And how the recovery_target_timeline = 'latest' should minimize the promotion time??
Thanks,
ShayC
On Fri, Jan 1, 2016 at 12:55 PM, Vladimir Borodin <root@xxxxxxxxxxx> wrote:
1 янв. 2016 г., в 9:29, Shay Cohavi <cohavisi@xxxxxxxxx> написал(а):Hi,I have postgresql 9.3 setup with 2 nodes (active/standby with streaming replication & continuos archiving).I have created 2 failover & failback script in order to perform a switchover between the DB servers:1. failover - create a trigger file in order to promote the new primary.2. failback - perform a base backup as mentions in :a. start backup on the primary.b. stop the failed node .didn't delete the DB directory on the failed nodec. performing rsync between the nodes.d.stopping the backup on the primary.e.performing rsync on the pg_xlog.f. creating a recovery.confstandby_mode = 'on'primary_conninfo = 'host=10.50.1.153 port=5432 user=usr password=pass'restore_command = 'scp 10.50.1.153:/home/postgres/archive/%f %p'trigger_file = '/home/postgres/databases/fabrix/trigger'archive_cleanup_command = 'ssh 10.50.1.153 /home/postgres/pg_utils/archive_cleanup.sh %r'g. starting the failed node as secondary.the switchover method:1. stop the primary node.2. promote the secondary node (failover.sh).3. perform failback on the failed node.4. start the failed node.this method works great!but if I perform multiple switchovers (>20), each time the new primary gets promoted (trigger file) - it takes longer because it searches the timelines on the archive.is there any way to prevent the multiple 'scp' archive commands which makes the promotion longer!for example:[2015-12-12 20:35:10.769 IST] LOG: trigger file found: /home/postgres/databases/fabrix/trigger[2015-12-12 20:35:10.769 IST] FATAL: terminating walreceiver process due to administrator commandscp: /home/postgres/archive/0000009400000002000000DC: No such file or directory[2015-12-12 20:35:10.893 IST] LOG: record with zero length at 2/DC000168[2015-12-12 20:35:10.893 IST] LOG: redo done at 2/DC000100scp: /home/postgres/archive/0000009400000002000000DC: No such file or directoryscp: /home/postgres/archive/0000009300000002000000DC: No such file or directoryscp: /home/postgres/archive/0000009200000002000000DC: No such file or directory...scp: /home/postgres/archive/0000009100000002000000DC: No such file or directoryscp: /home/postgres/archive/0000009000000002000000DC: No such file or directoryscp: /home/postgres/archive/00000095.history: No such file or directory[2015-12-12 20:35:11.801 IST] LOG: selected new timeline ID: 149[2015-12-12 20:35:11.931 IST] LOG: restored log file "00000094.history" from archive[2015-12-12 20:35:12.173 IST] LOG: archive recovery complete[2015-12-12 20:35:12.181 IST] LOG: database system is ready to accept connections[2015-12-12 20:35:12.181 IST] LOG: autovacuum launcher startedthis could take for a least 1 min.....or more.is there any way to skip the timeline searching in order to decrease the promotion?You should add recovery_target_timeline = 'latest' to your recovery.conf [0].<...>If you plan to have multiple standby servers for high availability purposes, set recovery_target_timeline to latest, to make the standby server follow the timeline change that occurs at failover to another standby.<...>Thanks,ShayC