Good morning everyone,
Following some troubleshooting last night we have managed to resolve the issue.
Plowing through the entire thing we have actually came to the conclusion that the cluster is running but not replicating.
So the number one lesson learned is to always check replication in the cluster, for the sake of data safety and not having to go through a million things.
This cluster is set up without a VIP, therefore db01 will always be the master. Having it set up this way we have found that pb_hba.conf had this line:
host all postgres db01/cidr trust
From my understanding this means that the master was trying to replicate to itself and not trusting the other nodes?
To fix we have put down the entire network:
host all postgres network/cidr trust
Following the config amendment we have restarted the replicas with patronictl -c /path/patroni/config reinit <cluster> <host>
Happy to say that clean up of wal files kicked in and now are down 4% usage of /var volume from 96%.
Now then, there is still the bit with the actual postgres logs not rotating properly? lol, but ill leave that for another email.
Massive thank you to all of you for the support.
On Thu, Jan 23, 2025 at 7:02 PM Saul Perdomo <saul.perdomo@xxxxxxxxx> wrote:
Thanks for the correction Adrian - my oversimplification went too far, and into "plain wrong" territory.(The detail that I felt was too much for this explanation was: "and the way to simply get rid of them would be to set your archive command to '/bin/true', say".. but didn't want to make it seem like I was suggesting Paul do that)On Thu, Jan 23, 2025, 11:07 a.m. Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:On 1/23/25 06:51, Saul Perdomo wrote:
> This is why everybody will tell you "don't just delete these files,
> archive them properly!" Again, for operational purposes, you could just
> delete them. But you really want to make a /copy /of them before you
> do... you know, /just in case /something bad happens to your DB that
> makes you want to roll it back in time.
No you can't just delete them for operational purposes without knowledge
of whether they are still needed or not.
Per:
https://www.postgresql.org/docs/current/wal-intro.html
and
https://www.postgresql.org/docs/current/wal-configuration.html
Short version, a WAL file must remain until a checkpoint is done that
makes it's content no longer needed.
> Cheers
> Saul
>
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx