Re: Occasional Slow Commit

"David Rees" <drees76@xxxxxxxxx> · Wed, 29 Oct 2008 15:30:19 -0700

On Wed, Oct 29, 2008 at 6:26 AM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote:
> The CentOS 4.7 kernel will happily buffer about 1.6GB of writes with that
> much RAM, and the whole thing can get slammed onto disk during the final
> fsync portion of the checkpoint.  What you should do first is confirm
> whether or not the slow commits line up with the end of the checkpoint,
> which is easy to see if you turn on log_checkpoints.  That gives you timings
> for the write and fsync phases of the checkpoint which can also be
> informative.

OK, log_checkpoints is turned on to see if any delays correspond to
checkpoint activity...

>> Reading this page[2] indicates that I may want to increase my
>> checkpoint_segments, checkpoint_timeout and bgwriter settings, but
>> looking at pg_stat_bgwriter seems to indicate that my current settings
>> are probably OK?
>>
>> # select * from pg_stat_bgwriter;
>> checkpoints_timed | checkpoints_req | buffers_checkpoint |
>>             3834 |             105 |            3,091,905 |
>> buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
>> 25876 |              110 |         2,247,576 |       2,889,873
>
> I reformatted the above to show what's happening a bit better.

Sorry, gmail killed the formatting.

> Most of your
> checkpoints are the timed ones, which are unlikely to cause interference
> from a slow commit period (the writes are spread out over 5 minutes in those
> cases).  It's quite possible the slow periods are coming only from the
> occasional requested checkpoints, which normally show up because
> checkpoint_segments is too low and you chew through segments too fast.  If
> you problems line up with checkpoint time, you would likely gain some
> benefit from increasing checkpoint_segments to spread out the checkpoint
> writes further; the 10 you're using now is on the low side for your
> hardware.

OK, I've also bumped up checkpoint_segments to 20 and
checkpoint_completion_target to 0.7 in an effort to reduce the effect
of checkpoints.

> If the problems don't go away after that, you may be suffering from
> excessive Linux kernel buffering instead.  I've got a blog entry showing how
> I tracked down a similar long pause on a Linux server at
> http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html you
> may find helpful for determining if your issue is this one (which is pretty
> common on RHEL systems having relatively large amounts of RAM) or if it's
> something else, like the locking you mentioned.

Ah, interesting. I've also turned down the dirty_ratio and
dirty_background_ratio as suggested, but I don't think this would be
affecting things here. The rate of IO on this server is very low
compared to what it's capable of.

Thanks for the suggestions, I'll report back with results.

-Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance