Re: pg_xlog size growing untill it fills the partition

Jeff Janes <jeff.janes@xxxxxxxxx> · Mon, 7 Oct 2013 09:07:57 -0700

On Mon, Oct 7, 2013 at 6:23 AM, Marcin Mańk <marcin.mank@xxxxxxxxx> wrote:

On Thu, Oct 3, 2013 at 11:56 PM, Michal TOMA <mt@xxxxxxxxxx> wrote:

This is what I can see in the log:

2013-10-03 13:58:56 CEST   LOG:  checkpoint starting: xlog

2013-10-03 13:59:56 CEST   LOG:  checkpoint complete: wrote 448 buffers (0.2%); 0 transaction log file(s) added, 9 removed, 18 recycled; write=39.144 s, s, sync=12102.311 s, total=12234.608 s; sync files=667, longest=181.374 s, average=18.144 s

2013-10-03 22:30:25 CEST   LOG:  checkpoint starting: xlog time

From your logs, it seems that the writes are spread all over the (fairly large) database. Is that correct? What is the database size? What is the size of the working data set (i.e. the set of rows that are in use)?

I heard of people having good results with setting a low value for shared_buffers (like 128MB) in a high write activity scenarios. Setting it that low would mean that checkpoints would have 16 times less to do.

It looks like most of the actual writing is being done by either the background writer or the backends themselves, not the checkpoint.  And the checkpointer still has to sync all the files, so lowering it further is unlikely to help.

I don't think he ever gave us the specs of the RAID is using.   My guess is that it is way too small for the workload.

Cheers,

Jeff