Re: Advice on cluster architecture for two related, but distinct, use cases

Zahid Rahman <zahidr1000@xxxxxxxxx> · Thu, 7 Nov 2024 23:49:16 +0000

Perhaps a 14 minute investment in this article may prove fruitful.
https://medium.com/@martin.hodges/adding-a-postgres-high-availability-database-to-your-kubernetes-cluster-634ea5d6e4a1

On Thu, 7 Nov 2024, 21:06 Matthias Leisi, <matthias@xxxxxxxxx> wrote:
Dear all,
(This is a follow-up to a question I asked almost exactly a year ago, https://postgrespro.com/list/thread-id/2670756#726F3765-858C-4AC0-A7B0-5CB6720E4B37@xxxxxxxxx - the requirements have changed since then, and the platform has changed from OpenBSD to Linux, which may make some things easier.)

I’m looking for advice on Postgres cluster architecture(s) for two related but distinct use cases. Ideally, the approaches for the two use cases would not differ too widely.

The goal of clustering is low RPO (I guess we need sync clustering) and RTO (ideally almost-instant failover, but a failover process of up to a minute in the worst case could be acceptable); throughput is not a concern (it’s relatively low transaction volume except for some often-written statistics data, which is still moderate). Latency (due to the distance between datacenters for georedundancy) is a fact we are willing to accept.

The first use case is in an environment under our own control (and where eg a DBA could intervene). We can theoretically run any number of cluster instances, but assume we would use an even number (split over the two datacenters), or potentially an odd number of nodes (eg with an arbiter). We could use a load balancer, but I guess this would strongly deviate from the second use case:

In the second use case, the environment is not under our control, so we can only assume basic network connectivity from the application to the DB, and between the DBs (the latter potentially through an SSH tunnel if needed). In this use case, we can not assume a person to intervene if a node goes down, and would prefer some automated failover to the other node (this automation would also be welcome for the first use case, eg if something happens while nobody is watching). We can not assume eg a load balancer.

There could be various ways how the environment in the second use case is set up, ranging from „application and database running on the same box“ (well, no clustering for you then…), to dedicated two- or three node database cluster serving a number of application machines.

In both use cases, we have full control over the application and the database code and environment.

From reading various docs, it seems we would need something like Patroni (/Percona), at least for the first use case. However it seems relatively complex to set up and operate.

I would appreciate your experience and input into which approach would best fit the two use cases. We are also willing to engage in paid consulting.

Thanks,
— Matthias

-- 
Matthias Leisi
Katzenrütistrasse 68, 8153 Rümlang
Mobile +41 79 377 04 43
matthias@xxxxxxxxx