Hello Victor,
Am 25.08.2023 um 13:18 schrieb Victor Sudakov:
Dear Colleagues,
Do you perchance know what is the correct procedure of temporarily
taking down a replica in a Patroni cluster, e.g. for 5-10 minutes of
hardware maintenance?
The problem is that after stopping the patroni process (service) on a
replica, patroni removes the corresponding physical replication slot
from the leader, and unless the wal_keep_size value is unsanely high,
the replica, when up again, cannot restart streaming because the WAL
segments are already gone from the leader.
Well, you all know:
<%%%>LOG: started streaming WAL from primary at B4A0/E2000000 on timeline 8
<%%%>FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000080000B4A0000000E2 has already been removed
<%%%>LOG: waiting for WAL to become available at B4A0/E2002000
Do you think there is a way to tell Patroni that a replica is down
temporarily and its replication slot should not be removed?
Or, what am I missing?
you may use patronictl pause + resume
keep in mind to set wal_keep_size (or wal_keep_segments depending on
your PG version high enough)
regards
Georg