On Mon, Oct 7, 2013 at 6:23 AM, Marcin Mańk <marcin.mank@xxxxxxxxx> wrote:
On Thu, Oct 3, 2013 at 11:56 PM, Michal TOMA <mt@xxxxxxxxxx> wrote:
2013-10-03 13:59:56 CEST LOG: checkpoint complete: wrote 448 buffers (0.2%); 0 transaction log file(s) added, 9 removed, 18 recycled; write=39.144 s, s, sync=12102.311 s, total=12234.608 s; sync files=667, longest=181.374 s, average=18.144 s
This is what I can see in the log:
2013-10-03 13:58:56 CEST LOG: checkpoint starting: xlog
2013-10-03 22:30:25 CEST LOG: checkpoint starting: xlog timeFrom your logs, it seems that the writes are spread all over the (fairly large) database. Is that correct? What is the database size? What is the size of the working data set (i.e. the set of rows that are in use)?
I heard of people having good results with setting a low value for shared_buffers (like 128MB) in a high write activity scenarios. Setting it that low would mean that checkpoints would have 16 times less to do.
It looks like most of the actual writing is being done by either the background writer or the backends themselves, not the checkpoint. And the checkpointer still has to sync all the files, so lowering it further is unlikely to help.
I don't think he ever gave us the specs of the RAID is using. My guess is that it is way too small for the workload.
Cheers,
Jeff