Re: Cluster for an appliance-type deployment

Christian Ramseyer <ramseyer@xxxxxxxxxx> · Wed, 8 Nov 2023 00:07:10 +0100

On 06.11.23 20:26, Matthias Leisi wrote:
Dear all,

I’m reasonably experienced with Postgres with simple (single, „rebuild 
and restore“) requirements, but would need some hints on what to look 
for in a more complex situation - deploying Postgres as the backend for 
a (virtual) appliance.

This appliance can scale horizontally from a single to dozens of VMs 
(theoretically more, but most installations are small-ish). It is 
feasible to configure VMs for particular purposes (eg „you are [also] a 
DB node“), but basically all instances will/should be able to perform 
their main tasks besides (also) being a DB node. As the VMs may be 
installed in very different environments, network-based solutions are 
less feasible and we would prefer a DB-level solution. We assume that 
for most cases, primary/stand-by configurations would be sufficient in 
terms of availability / latency / throughput.

We must also assume that there is no person who would be able to touch 
things if an error occurs. Data consistency and (as much as possible) 
automated recovery from error situations („VM down“, „network lost“, …) 
are therefor more important than „n nines". We can assume that the VMs 
can talk to each other over TCP (eg using SSH tunnels, direct Postgres 
connection, or some other suitable protocol). Scripting „around“ the 
database is available to initialize instances and for similar tasks.

Would Postgres’ own log-shipping (file-based + streaming replication, 
possibly with remote_write) be sufficient for such a set of requirements?

What aspects would you consider important for such a scenario?

The replication that ships with Postgres gives you one writeable primary 
server and a number of standbys, but it has no tools to automatically 
discover or recover from failure. From 
https://www.postgresql.org/docs/current/warm-standby-failover.html:

---

PostgreSQL does not provide the system software required to identify a 
failure on the primary and notify the standby database server. Many such 
tools exist and are well integrated with the operating system facilities 
required for successful failover, such as IP address migration.

Once failover to the standby occurs, there is only a single server in 
operation. This is known as a degenerate state. The former standby is 
now the primary, but the former primary is down and might stay down. To 
return to normal operation, a standby server must be recreated, either 
on the former primary system when it comes up, or on a third, possibly 
new, system. The pg_rewind utility can be used to speed up this process 
on large clusters. Once complete, the primary and standby can be 
considered to have switched roles. Some people choose to use a third 
server to provide backup for the new primary until the new standby 
server is recreated, though clearly this complicates the system 
configuration and operational processes.

---

So you need to add additional parts from the ecosystem to detect 
failure, handle failover, potentially move an IP address with the 
Primary etc. Popular tools are repmgr, Patroni, CloudNativePG, BDR, 
pacemaker+corosync and endless others. They will address many of your 
requirements, but still some work and understanding is required to make 
them fully unattended, as well as have them expand dynamically with new 
replicas if an appliance is added.

Postgres is an amazing product and I like to use it for almost 
everything, but in this scenario we are of course making our live hard 
with ACID compliance. In NoSQL/"eventual consistency" land, there are 
products that are a lot friendlier to a setup like this - stuff like 
Cassandra, etcd, CouchDb comes to mind. I'd compare the pros and cons of 
such alternatives, the big con of course being a lot less consistency 
and durability guarantees - the question is whether the application 
needs it.

But enough about non-Postgres topics on this list :) To go with 
Postgres, on a hunch I'd try Patroni first: it does a lot of the 
advanced failover stuff, has a great track record, and supposedly runs 
on BSD: https://openports.pl/path/databases/patroni

Cheers
Christian

--
Christian Ramseyer, netnea ag
Network Management. Security. OpenSource.
https://www.netnea.com
Phone: +41 79 644 77 64