On 27/2/19 2:52 μ.μ., Johannes Truschnigg wrote:
Hi Jay,
On Wed, Feb 27, 2019 at 07:40:26AM -0500, John Scalia wrote:
Hello, folks,
Yesterday, I had a small file system fill up, due to some logical
replication testing we had been performing. We had been testing IBM’s IIDR
system and apparently it had built a logical replication slot on my server.
When the test was completed, nobody removed the slot, so WAL segments
stopped being dropped. Now I can understand the difficulty separating what
physical versus logical replication needs from the WAL segments, but as
logical replication is database specific not cluster wide, this behavior was
a little unexpected, since the WAL segments are cluster wide. Are WAL
segments going to pile up whenever something drops a logical replication
connection? I’ve seen it, but it seems like this could be a bad thing.
Since Logical Replication is piggybacked on Physical Replication, you cannot
use the first without having the latter. And yes, what you experienced is one
of the dangers of using replication slots when having a busy database (i.e.
producing lots of WAL) and a filesystem with little excess space. Under these
circumstances, it is imperative to monitor for (and alert on) anything going
awry with your replication slot consumers, and/or the size of your wal/xlog
directory. It's a feature of replication slots to work that way - but one that
may end up biting you.
A logical approach for replication slots would be to accept a parameter regarding max WAL files to retain, after which newer WALs will be removed and the primary server saved. Pretty much like :
--archive-push-queue-max argument of pgbackrest .
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt