I manage PostgreSQL v12.8 database clusters.
Our database clusters are on Linux VMs, with OS:
Flavor: redhat_7
Release: 3.10.0-1160.45.1.el7.x86_64
We have repmgr clusters of 1 Primary & 2 Standby servers & use another server with PgBouncer to direct the connections to the current Primary.
The PostgreSQL servers have asynchronous replication & use repmgr to handle automatic failovers.
Any failovers have always promoted the 1st Standby server in the cluster.
We did have one time where the newly promoted server almost immediately went down & the 2nd Standby was promoted.
All three servers in the cluster currently have a priority of 100.
I've been requested to set up DR for the Production cluster. My Google searches haven't found anything describing how DR should be implemented. I have seen that there should always be an odd number of servers in the cluster.
My thoughts are to have a new VM created in our DR datacenter that will replace the 2nd Standby server in the cluster. I'm thinking that the DR server would have a lower priority in the repmgr.conf file (50 instead of 100), since we would want the local
Standby to be promoted first. We have failover configured to be automatic, but it waits 80 seconds before promoting a Standby (to avoid a premature failover due to network flapping).
Is there any reason to change the failover configuration to manual for DR?
I would think in a DR situation we would want it to be automatic, but my colleague disagrees. I'm thinking that if the Primary & the 1st Standby both go down, even if it isn't a real "DR" situation, we would still want it to promote to the 3rd (DR) server
to prevent an outage. Our failover script performs post-promote tasks, including redirecting the PgBouncer server to point to the new Primary, so it shouldn't matter if it was pointing to the DR server or a local server.
If we do decide to make failover manual, can the configuration specify automatic for the local servers & manual for the DR server, or is it an all or nothing type of configuration?
I guess another strategy could be to configure it as a stand-alone server & leave our current cluster intact. Then copy all the backup & WAL files to the DR server, but then in a DR situation we would have to perform the restore before the databases could
be available.
Any advice on a good DR strategy would be appreciated.
Thanks, Karin Hilbert
|