Re: Failover / repmgr questions

Norbert Poellmann <np@xxxxxx> · Wed, 1 Feb 2023 16:35:15 +0100

On Tue, Jan 24, 2023 at 12:26:09PM -0700, Sbob wrote:
> All;
> 
> 
> we are constructing a PostgreSQL based architecture that is more complex 
> than "promote node B and manage the IP"  for a failover.
> 
> It would be ideal if I could deploy repmgr and have repmgr run a script 
> for me if it detects a failure, but it looks like this may not be an 
> option. Anyone know if I can force repmgr to do this? Is there a better 
> tool that does?

Hi,

repmgr config option: "promote_command" 
(see https://repmgr.org/docs/repmgr.html) is the way to go.

For that you have to have a running repmgrd (note the "d" for the
daemon part of repmgr) on all postgresql servers in your cluster
and - additionally - on an additional witness node, which is not
part of the postgresql replication, but keeps a clone of the
separate "repmgr" database.

For promote_command You can create a script, which does all the necessery 
details. Example:

# repmgr.conf
promote_command="/usr/local/bin/promote hostname123"

Many environments will need to have the database clients to know, which database
server is the current primary. In case of a failover, the primary
role moves from the old primary database server to the (elected by repmgrd)
new primary database server. You will want to tell the clients about the move.

In case of failover the /usr/local/bin/promote script will be triggered by
and on the new primary node, which is in the given example "hostname123".

Inside your /usr/local/bin/promote script you can switch the database clients 
or database proxies (examples pgbouncer, haproxy) to point them to the
new primary node.

If that runs well, you can call as one of the last commands in your script
the core server switching command like:

	/usr/bin/repmgr standby promote -f /etc/repmgr.conf 

A last comment to repmgr: it does not avoid in any case to end up with multiple 
primary servers - while it gives warnings about that, for ex. in the "repmgr cluster show"
command.

You must take care, that the old, retired master does not come back into 
the cluster after you have switched the primary role to another, new primary
server - which might be difficult in a jittering network state.

Check for the concepts "fencing", "STONITH",  for example here:
https://www.postgresql.org/docs/15/warm-standby-failover.html

cheers 

Norbert Poellmann

--
Norbert Poellmann EDV-Beratung             email  : np@xxxxxx
Severinstrasse 5                           telefon: 089 38469995  
81541 Muenchen, Germany                    telefon: 0179 2133436