Re: xlog-cleanup problem

jayknowsunix@xxxxxxxxx · Tue, 4 Nov 2014 21:02:52 -0500

Tom,

When you say the script is consumating the WAL files, what is it actually doing? Here at Verizon, we just invoke an hourly pg_basebackup via a cron job. Cron invokes a python script that copies the tar.gz file into part of our cloud when the backup is finished. If your script is using some other pg mechanism for backing up, most of those are just slow anyway.
--
Jay

Sent from my iPad

On Nov 4, 2014, at 7:13 PM, "Tom Fischer" <tom@xxxxxxxx> wrote:

Hello,

we have some problems with the xlog-directory (size 5 GB). When we take a 
backup from one of our databases (about 80 GB data) and use an own script called 
by recovery.conf it begins to start the consumation of the wal-files fetched 
from our nas at a rate of about 2 per second.

Postgres just will not clean up the xlog-directory in time, so it runs full 
and next essential wal-file file could not be copied.
So my script stops for 5 minutes and sometimes the xlog is cleaned up a bit 
(from 100% to 84%) and the game starts again, there also has been some kind of 
lock and nothing happened until I restarted the cluster. On a restart the 
directory is cleaned up and about one half of the 5 GB will become 
available.

Checkpoint_segments are set to 64, wal-file-size is 16 MB, so at about 300 
wal-files the whole game is over even if there should never be more then 3 * 64 
+ xxx, what should be under 200.
Why do we get a lot more then 300 wal-files and what can we do against 
this, even 10 GB xlog-size is not enough to keep up some kind of balance.

With friendly regards 
Tom Fischer