Re: checkpoint spikes

Greg Smith <greg@xxxxxxxxxxxxxxx> · Thu, 10 Jun 2010 16:00:54 -0400

Janning wrote:
1. With raising checkpoint_timeout, is there any downgrade other than slower 
after-crash recovery?

Checkpoint spikes happen when too much I/O has been saved up for 
checkpoint time than the server can handle.  While this is normally 
handled by the checkpoint spreading logic, you may find that with your 
limited disk configuration there's no other way to handle the problem 
but to make checkpoints much more frequent, rather than slower.  At 
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm I 
write about how you had to tune PostgreSQL 8.0 to 8.2 in order to keep 
checkpoint spikes from getting too big.  You may have to follow those 
same techniques for your server.  Just don't try to increase the 
background writer settings in your case--the 8.3 one is different enough 
that you can't tune that the way was suggested for 8.2.

2. Is there a way to calculate the after-crash recovery time with a certain 
checkpoint_timeout? How long would be approx. for a checkpoint_timeout of 60 
minutes?

Simulate it.  No way to estimate.

3. Is it sane to set checkpoint_timeout to 120min or even to 600min?

Checkpoints happen when you reach either checkpoint_segments of WAL 
written *or* reach checkpoint_timeout, whichever happens first.  You'd 
have to set both to extremely large values to get checkpoints to happen 
really infrequently.  Which I suspect is the exactly opposite of what 
you want--you can't handle the spike from a long delayed checkpoint, and 
probably want to tune for shorter and smaller ones instead.

Every now and then we run into someone who had to retune their system to 
something like:

shared_buffers=512MB
checkpoint_segments=3

In order to avoid spikes from killing them.  That may be the direction 
you have to head.  The longer the time between checkpoints, the bigger 
the spike at the end is going to be to some extend; you can't completely 
spread that out.

5. Does anybody know if I can set dirty_background_ratio to 0.5? As we have 12 
GB RAM and rather slow disks 0,5% would result in a maximum of 61MB dirty 
pages. 

Nope.  Linux has absolutely terrible controls for this critical 
performance parameter.   The sort of multi-second spikes you're seeing 
are extremely common and very difficult to get rid of.

PS: Do I need to post this question on pgsql-perfomance? If so, please let me 
know.

That would have been the better list for it originally.  I also wrote 
something about a technique that uses pg_stat_bgwriter snapshots to help 
model what the server is doing in these cases better you might find 
useful on the admin list, it's at 
http://archives.postgresql.org/pgsql-admin/2010-06/msg00074.php

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx   www.2ndQuadrant.us

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general