Hi! Thanks for the replies. In my case I'm avoiding manual DC failover by setting zero priority to the DC2 nodes (node3 and node4). DC1 (node1 and node2) failure is considered a major event that requires manual intervention anyhow. I still have repmgrd running on all nodes though. Anyhow, the split brain issue is still there for node1/node2. Martin Goodson is right that pgbouncer reconfig is not a magic bullet. If one of the pgbouncer instances is also affected (or part of relevant network), be it installed on the client machine or separately this solution will still produce a split brain scenario. I've been looking at STONITH like solution. In my case it's VMWare environment. When node2 is being promoted it sends a signal to vCenter to kill node1. This could work though there are security concerns. Should a VM be able to control another VM via vCenter? What if that call fails (is it sync or async/queued)? Do we even know if it fails in vCenter? If it's a synced call should promotion be canceled if it returns an error? I haven't looked hard enough yet but hope to find a way for vCenter to monitor for a file on node2 which would trigger the shutdown on node1. The file would be set when promoting node2. I've also thought of monitoring the system state from node1. When node1 is back it could detect that the rest of the cluster has elected a new master (by connecting to slaves and checking repmgr tables) and shut down. However this still leaves a short window where node1 will accept connections. And if it's a network issue that splits the clients as well we'd have split brain immediately. So it's a no go. You can only elect a new master when the old one is definitely killed. So in case a network split occurs that leaves some/all clients connected to the original master and say node3 from DC2 too so COMMIT works fine. During the timeout before node2 kills node1 some clients will still write to node1. This data doesn't make it to node2. Now during promotion it will be determined that node3 has the latest data [1]. What will happen if it's priority is zero though? I will need to test all this but it looks like I'll have to allow automatic DC failover to occur and just set some low priority for nodes 3 and 4. Maybe I'll need a witness server as well then. [1] https://github.com/2ndQuadrant/repmgr/blob/master/docs/repmgrd-failover-mechanism.md On Tue, Aug 15, 2017 at 10:31 AM, Marc Mamin <M.Mamin@xxxxxxxxxxxx> wrote: > > >>I finally found this document NOT referenced from the main README file in the repmgr repo. >> >>https://github.com/2ndQuadrant/repmgr/blob/master/docs/repmgrd-node-fencing.md >> >>I guess the default solution is pgbouncer > > Hello, > I'm not sure that any solution can be considered as standard, but we did implement such a solution with pgbouncer. > The script in the linked reference seems somewhat dangerous to me as it first reconfigure pgbouncer and then promote. > This is not safe if the postgres nodes were to suffer a brain split. > > In our case we used following sequence: > - stop pgbouncer > - promote > - reconfigure and restart pgbouncer > > This same sequence can be used for a manual switchover. > > regards, > > Marc Mamin > > >> >>Any simpler solutions for this tricky problem? >> >>Regards, >> >>Aleksander >> >>On Mon, Aug 14, 2017 at 5:03 PM, Aleksander Kamenik <aleksander.kamenik@xxxxxxxxx> wrote: >>> Hi! >>> >>> In a cluster set up with postgres 9.6, streaming replication and >>> repmgr I'm struggling to find a good/simple solution for avoiding >>> split brain. >>> >>> The current theoretical setup consists of 4 nodes across two data >>> centers. The master node is setup with 1 of 3 synchronous replication. >>> That is it waits for at least one other node to COMMIT as well. >>> repmgrd is installed on every node. >>> >>> The clients will use postgresql JDBC with targetServerType=master so >>> they connect only to the master server in a list of four hosts. >>> >>> The split brain scenario I forsee is when the master node locks up or >>> is isolated for a while and comes back online after repmgrd on other >>> nodes have elected a new master. >>> >>> As the original master node has a requirement of one synced >>> replication node and the remaining two standbys are streaming from the >>> new master it will fortunately not start writing a separate timeline, >>> but will still serve dated read only queries. For writes it will >>> accept connections which hang. The repmgrd instance on the original >>> master sees no problem either so does nothing. >>> >>> Ideally though this instance should be shut down as it has no slaves >>> attached and the status on other nodes indicates this master node is >>> failed. >>> >>> Any suggestions? I'm trying to keep the setup simple without a central >>> pgbouncer/pgpool. Any simple way to avoid a central connection point >>> or custom monitoring script that looks for exactly this issue? >>> >>> Also, do you see any other potential pitfalls in this setup? >>> >>> Thanks for thinking this through, >>> >>> Aleksander >>> >>> -- >>> Aleksander Kamenik >> >> >> >>-- >>Aleksander Kamenik >> >> >>-- >>Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) >>To make changes to your subscription: >>http://www.postgresql.org/mailpref/pgsql-admin >> -- Aleksander Kamenik -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin