We currently have a 9.1.5 postgres cluster running using streaming replication. We have 3 nodes right now
2 - local that are setup with pacemaker for a HA master/slave set failover cluster
1 - remote as a DR.
Currently we're syncing with the pretty standard routine
clear local datadir
pg_start_backup
sync datadir with fast-archiver (https://github.com/replicon/fast-archiver)
pg_stop_backup
start slave
We use the streaming replication with wal_keep_segments set to 1000 to get the required WAL files to the slaves.
With this procedure we can currently only sync one of the slaves at a time if we failover. As when the second machine goes to start the sync it errors out cause trying to run pg_start_backup fails.
We're looking into was to allow both the slave and the DR to sync at the same time.
The procedure I'm currently testing is
clear localdatadir
pg_start_backup
scp datadir/backuplabel
pg_stop_backup
sync datadir with fast-archiver
start slave
This seems to be working and the slave comes up correctly and streams the WAL files it needs from the backup_label that was copied during the pg_start_backup/pg_stop_backup
Is there any hidden issue with this that we haven't seen. Or does anyone have suggestions as to an alternate procedure that will allow 2 slaves to sync concurrently.
Thanks