On Mar 26, 2014, at 9:04 AM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote: > On Tue, Mar 25, 2014 at 6:33 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote: > On Tuesday, March 25, 2014, Steven Schlansker <steven@xxxxxxxxxxxx> wrote: > Hi everyone, > > I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain nameless, the WAL archive command included a ‘> /dev/null 2>&1’ which masked archive failures until the disk entirely filled with 400GB of pg_xlog entries. > > PostgreSQL itself should be logging failures to the server log, regardless of whether those failures log themselves. > > > I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains at a stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate. > > The leading edge of the log files should be archived as soon as they fill up, and recycled/deleted two checkpoints later. The trailing edge should be archived upon checkpoints and then recycled or deleted. I think there is a throttle on how many off the trailing edge are archived each checkpoint. So issues a bunch of "CHECKPOINT;" commands for a while and see if that clears it up. Indeed, forcing a bunch of CHECKPOINTS started to get things moving again. > > Actually my description is rather garbled, mixing up what I saw when wal_keep_segments was lowered, not when recovering from a long lasting archive failure. Nevertheless, checkpoints are what provoke the removal of excessive WAL files. Are you logging checkpoints? What do they say? Also, what is in pg_xlog/archive_status ? > I do log checkpoints, but most of them recycle and don’t remove: Mar 26 16:09:36 prd-db1a postgres[29161]: [221-1] db=,user= LOG: checkpoint complete: wrote 177293 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 56 recycled; write=539.838 s, sync=0.049 s, total=539.909 s; sync files=342, longest=0.015 s, average=0.000 s That said, after letting the db run / checkpoint / archive overnight, the xlog did indeed start to slowly shrink. The pace at which it is shrinking is somewhat unsatisfying, but at least we are making progress now! I guess if I had just been patient I could have saved some mailing list traffic. But patience is hard when your production database system is running at 0% free disk :) Thanks everyone for the help, if the log continues to shrink, I should be out of the woods now. Best, Steven -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general