On Tue, Mar 25, 2014 at 6:33 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Tuesday, March 25, 2014, Steven Schlansker <steven@xxxxxxxxxxxx> wrote:
Hi everyone,
I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain nameless, the WAL archive command included a ‘> /dev/null 2>&1’ which masked archive failures until the disk entirely filled with 400GB of pg_xlog entries.
PostgreSQL itself should be logging failures to the server log, regardless of whether those failures log themselves.
I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains at a stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate.The leading edge of the log files should be archived as soon as they fill up, and recycled/deleted two checkpoints later. The trailing edge should be archived upon checkpoints and then recycled or deleted. I think there is a throttle on how many off the trailing edge are archived each checkpoint. So issues a bunch of "CHECKPOINT;" commands for a while and see if that clears it up.
Actually my description is rather garbled, mixing up what I saw when wal_keep_segments was lowered, not when recovering from a long lasting archive failure. Nevertheless, checkpoints are what provoke the removal of excessive WAL files. Are you logging checkpoints? What do they say? Also, what is in pg_xlog/archive_status ?
Cheers,
Jeff