Re: Server hanging with high CPU wait -- Checkpoint issue?

Jeff Janes <jeff.janes@xxxxxxxxx> · Mon, 3 Feb 2014 12:35:25 -0800

On Mon, Feb 3, 2014 at 11:16 AM, George Woodring <george.woodring@xxxxxxxxxx> wrote:

I am running 9.3.2 on CentOS 6.5

We have a new server we are migrating to that seems to lock up for up to 30sec at a time with all of the processes in wait.  I am seeing a weird output with the checkpoint logs and am wondering if I need to adjust something with the checkpointer config.

My checkpoint settings are:

checkpoint_completion_target = 0.9
checkpoint_timeout = 12min
checkpoint_segments = 50		# in logfile segments, min 1, 16MB each

checkpoint_warning = 300s		# 0 disables

When the box is running will, the checkpoint logs look like the following, checkpoints restarting every 12 minutes and it completing in about 10 minutes.

Feb  3 11:00:09 sponge07b postgres[2754]: [2651-1] LOG:  checkpoint starting: time
Feb  3 11:10:57 sponge07b postgres[2754]: [2652-1] LOG:  checkpoint complete: wrote 32352 buffers (6.2%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=647.749 s, sync=0.110 s, total=647.908 s; sync files=424, longest=0.017 s, average=0.000 s

...

When we are having the issue just now, the checkpoint logs look like the following.  Checkpoints running every 12 minutes, but the checkpoint is written after 2-3 minutes

Feb  3 12:24:09 sponge07b postgres[2754]: [2665-1] LOG:  checkpoint starting: time
Feb  3 12:27:43 sponge07b postgres[2754]: [2666-1] LOG:  checkpoint complete: wrote 10127 buffers (1.9%); 0 transaction log file(s) added, 4 removed, 28 recycled; write=197.510 s, sync=15.975 s, total=213.746 s; sync files=440, longest=9.089 s, average=0.022 s

Probably what is happening here is that the backends or the bgwriter are writing out dirty buffers (because they need those buffers for other purposes) that have been marked for the checkpointer to write.  The checkpointer then thinks it is falling behind, because it does not realize that other processes are clearing its flags, so it finishes early.

This estimation problem perhaps can be fixed in a future version of PostgreSQL, for example by including in the fsync queue information about how many buffers flagged for the checkpointer have been written by others, or by adding a bit to the buffer headers for buffers that were flagged for the checkpointer but were written by someone else first.  

However, this would probably not solve the true problem, as I think the foreshortened checkpoint is probably just a symptom and not the cause.  The most likely cause is that you have a spike in buffer dirtying, which both causes congestion in your IO system (which actually causes the freezes) and also causes the bgwriter to write out buffers flagged for the checkpointer which in turn causes the checkpointer to finish early.  It is possible that fixing the estimation problem would give the IO more time to un-congest before the fsyncs start landing, but I am not optimistic about that being very effective.

But all of that is for PostgreSQL's hacker to consider.  From the perspective of a PostgreSQL user, I'd ask about the difference in RAM between the old and new server, and what the setting are for /proc/sys/vm/dirty_background_ratio and /proc/sys/vm/dirty_background_bytes?  Probably your new server is allowing a lot more dirty data to build up in the kernel's buffers, which then causes serious congestion when those buffers need to get written for real.

Cheers,

Jeff