pg_rewind after promote

Emond Papegaaij <emond.papegaaij@xxxxxxxxx> · Thu, 28 Mar 2024 15:52:52 +0100

Hi,
We develop an application that uses PostgreSQL in combination with Pgpool as a database backend for a Jakarta EE application (on WildFly). This application supports running in a clustered setup with 3 nodes, providing both high availability and load balancing. Every node runs an instance of the database, a pgpool and the application server. Pgpool manages the PostgreSQL replication using async streaming replication, with 1 primary and 2 standby nodes.

The versions used are (containerized on debian:bullseye-slim):
PostgreSQL version 12.18
Pgpool2 version 4.5.0

The problem we are seeing happens during planned maintenance, for example, when updates are installed and the hosts need to reboot. We take the hosts out of the cluster one at a time, perform the updates and reboot, and bring the host back into the cluster. If the host that needs to be taken out has the role of the primary database, we need to perform a failover. For this, we perform several steps:
 * we detach the primary database backend, forcing a failover
 * pgpool selects a new primary database and promotes it
 * the other 2 nodes (the old primary and the other standby) are rewound and streaming is resumed from the new primary
 * the node that needed to be taken out of the cluster (the old primary) is shutdown and rebooted

This works fine most of the time, but sometimes we see this message on one of the nodes:
pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required
This message seems timing related, as the first node might report that, while the second reports something like:
pg_rewind: servers diverged at WAL location 5/F28AB1A8 on timeline 21 pg_rewind: rewinding from last common checkpoint at 5/F27FCA98 on timeline 21 pg_rewind: Done!

If we ignore the response from pg_rewind, streaming will break on the node that reported no rewind was required. On the new primary, we do observe the database moving from timeline 21 to 22, but it seems this takes some time to materialize to be observable by pg_rewind. This window where the new timeline does exist, but is not observed by pg_rewind makes our failover much less reliable. So, I've got 2 questions:

1. Is my observation about the starting of a new timeline correct?
2. If yes, is there anything we can do during to block promotion process until the new timeline has fully materialized, either by waiting or preferably forcing the new timeline to be started?

Best regards,
Emond Papegaaij