Re: HA Setup Review

Ron Johnson <ronljohnsonjr@xxxxxxxxx> · Tue, 30 Apr 2024 10:20:47 -0400

You're confusing HA with DR.

A 3-node cluster, with two in the primary DC and the third (asynchronously replicated) in the remote DC will give you both.

ZERO downtime is -- to my knowledge -- impossible with master-slave replication.  There will always be some seconds of lag while  the secondary-that-was is promoted to new-primary, and the applications that were forcibly disconnected from the old primary are connected to the new-primary.

Heck, even in a master-master DB cluster, any connections on the master that dies will be down until they can connect to the other master.

On Tue, Apr 30, 2024 at 8:58 AM Deepak Pahuja . <deepakpahuja@xxxxxxxxxxx> wrote:

Hi Ron,

Thanks for the details.

Kindly share how we can achieve HA in postgresql, basically my requirement is zero downtime for the application and the database.

In this scenario we have to do failover and in that time there will be outage, kindly correct me if I am wrong.

Also share how can we achieve zero downtime of database (primary write available always) in PG.

Thanks Deepak 

Sent from Outlook for Android

From: Ron Johnson <ronljohnsonjr@xxxxxxxxx>

Sent: Tuesday, April 30, 2024 8:22:36 PM

To: pgsql-admin <pgsql-admin@xxxxxxxxxxxxxx>

Subject: Re: HA Setup Review

On Tue, Apr 30, 2024 at 3:41 AM akshay polji <akshay.polji@xxxxxxxxx> wrote:

Hello Team,

I am looking for some feedback on the HA Setup that we are finalizing for running our business critical workloads.

We are planning to follow this Setup, 

https://www.pgpool.net/docs/42/en/html/example-cluster.html

Basically a 3 node PostgreSQL Cluster, running 3 processes i.e. PostgreSQL DB, PGPool and WatchDog. 
These 3 nodes will be distributed across 3 availability zones/data centers for resilience and use a synchronous replication between Primary and Stand-by. 

You're describing HA+DR, not just HA, 

Also, I wouldn't do synchronous replication across the WAN. Not only is the latency too high for decent performance, but any fault in the network freezes the DB.

Synchronous option will be Any One, so that the DB availability is not impacted if 1 Stand-by is down for even planned outage i.e. Patching of DB or Virtual Machine. 

You can switch from async to sync replication just before patching, and then switch back to async when it's completed.

That's pretty much what we do for HA, except only two DB instances (but still three PgPool instances), and they are local and asynchronously replicated. DR is handled by VMware SRM.

Watchdog and heartbeat are built into PgPool.  Is that what you're using for WD and HB?