Re: Patroni vs pgpool II

Jehan-Guillaume de Rorthais <jgdr@xxxxxxxxxx> · Fri, 7 Apr 2023 09:45:57 +0200

On Fri, 07 Apr 2023 13:16:59 +0900 (JST)
Tatsuo Ishii <ishii@xxxxxxxxxxxx> wrote:

> >> > But, I heard PgPool is still affected by Split brain syndrome.    
> >> 
> >> Can you elaborate more? If more than 3 pgpool watchdog nodes (the
> >> number of nodes must be odd) are configured, a split brain can be
> >> avoided.  
> > 
> > Split brain is a hard situation to avoid. I suppose OP is talking about
> > PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would
> > avoid that.  
> 
> Ok, "split brain" means here that there are two or more PostgreSQL
> primary serves exist.
> 
> Pgpool-II's watchdog has a feature called "quorum failover" to avoid
> the situation. To make this work, you need to configure 3 or more
> Pgpool-II nodes. Suppose they are w0, w1 and w2. Also suppose there
> are two PostgreSQL servers pg0 (primary) and pg1 (standby). The goal
> is to avoid that both pg0 and pg1 become primary servers.
> 
> Pgpool-II periodically monitors PostgreSQL healthiness by checking
> whether it can reach to the PostgreSQL servers. Suppose w0 and w1
> detect that pg0 is healthy but pg1 is not, while w2 thinks oppositely,
> i.e. pg0 is unhealthy but pg1 is healthy (this could happen if w0, w1,
> pg0 are in a network A, but w2 and pg1 in different network B. A and B
> cannot reach each other).
> 
> In this situation if w2 promotes pg1 because w0 seems to be down, then
> the system ends up with two primary servers: split brain.
> 
> With quorum failover is enabled, w0, w1, and w2 communicate each other
> to vote who is correct (if it cannot communicate, it regards other
> watchdog is down). In the case above w0 and w1 are majority and will
> win. Thus w0 and w1 just detach pg1 and keep on using pg0 as the
> primary. On the other hand, since wg2 looses, and it gives up
> promoting pg1, thus the split brain is avoided.
> 
> Note that in the configuration above, clients access the cluster via
> VIP. VIP is always controlled by majority watchdog, clients will not
> access pg1 because it is set to down status by w0 and w1.
> 
> > To avoid split brain, you need to implement a combinaison of quorum and
> > (self-)fencing.
> > 
> > Patroni quorum is in the DCS's hands. Patroni's self-fencing can be achieved
> > with the (hardware) watchdog. You can also implement node fencing through
> > the "pre_promote" script to fence the old primary node before promoting the
> > new one.
> > 
> > If you need HA with a high level of anti-split-brain security, you'll not be
> > able to avoid some sort of fencing, no matter what.
> > 
> > Good luck.  
> 
> Well, if you define fencing as STONITH (Shoot The Other Node in the
> Head), Pgpool-II does not have the feature.

And I believe that's part of what Cen was complaining about:

«
  It is basically a daemon glued together with scripts for which you are 
  entirely responsible for. Any small mistake in failover scripts and 
  cluster enters  a broken state.
»

If you want to build something clean, including fencing, you'll have to
handle/dev it by yourself in scripts

> However I am not sure STONITH is always mandatory.

Sure, it really depend on how much risky you can go and how much complexity you
can afford. Some cluster can leave with a 10 minute split brain where some other
can not survive a 5s split brain.

> I think that depends what you want to avoid using fencing. If the purpose is
> to avoid having two primary servers at the same time, Pgpool-II achieve that
> as described above.

How could you be so sure?

See https://www.alteeve.com/w/The_2-Node_Myth

«
  * Quorum is a tool for when things are working predictably
  * Fencing is a tool for when things go wrong
»

Regards,