Re: checkpoint spikes

Janning <ml@xxxxxxxxxxxx> · Fri, 11 Jun 2010 09:28:19 +0200

On Thursday 10 June 2010 22:00:54 Greg Smith wrote:
> Janning wrote:
> > 1. With raising checkpoint_timeout, is there any downgrade other than
> > slower after-crash recovery?
>
> Checkpoint spikes happen when too much I/O has been saved up for
> checkpoint time than the server can handle.  While this is normally
> handled by the checkpoint spreading logic, you may find that with your
> limited disk configuration there's no other way to handle the problem
> but to make checkpoints much more frequent, rather than slower.  

Uhh! I had so much success with less frequent checkpoints. At least the spike 
is the same but it does not happen so often.

> At
> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm I
> write about how you had to tune PostgreSQL 8.0 to 8.2 in order to keep
> checkpoint spikes from getting too big.  You may have to follow those
> same techniques for your server.  Just don't try to increase the
> background writer settings in your case--the 8.3 one is different enough
> that you can't tune that the way was suggested for 8.2.

we are runing 8.4 and of course I read your article. I just raised 
bgwriter_lru_maxpage to 200 :-(

most docs I found relates to 8.2 and 8.3. In Things of checkpoints, is 8.4 
comparable to 8.3? It would be nice if you update your article to reflect 8.4.

> > 2. Is there a way to calculate the after-crash recovery time with a
> > certain checkpoint_timeout? How long would be approx. for a
> > checkpoint_timeout of 60 minutes?
>
> Simulate it.  No way to estimate.

Ok. won't try it now :-)

> > 3. Is it sane to set checkpoint_timeout to 120min or even to 600min?
>
> Checkpoints happen when you reach either checkpoint_segments of WAL
> written *or* reach checkpoint_timeout, whichever happens first.  

Sorry the question was not precise. I already raised checkpoint_segments to 
reach that goal of less frequent checkpoints.

> You'd
> have to set both to extremely large values to get checkpoints to happen
> really infrequently.  Which I suspect is the exactly opposite of what
> you want--you can't handle the spike from a long delayed checkpoint, and
> probably want to tune for shorter and smaller ones instead.
>
> Every now and then we run into someone who had to retune their system to
> something like:
>
> shared_buffers=512MB
> checkpoint_segments=3
>
> In order to avoid spikes from killing them.  That may be the direction
> you have to head.  The longer the time between checkpoints, the bigger
> the spike at the end is going to be to some extend; you can't completely
> spread that out.

I am really afraid of doing it right now. In my experience the spike is the 
same but we only have it once an hour.

> > 5. Does anybody know if I can set dirty_background_ratio to 0.5? As we
> > have 12 GB RAM and rather slow disks 0,5% would result in a maximum of
> > 61MB dirty pages.
>
> Nope.  Linux has absolutely terrible controls for this critical
> performance parameter.   The sort of multi-second spikes you're seeing
> are extremely common and very difficult to get rid of.

ok, thanks.

> > PS: Do I need to post this question on pgsql-perfomance? If so, please
> > let me know.
>
> That would have been the better list for it originally.  I also wrote
> something about a technique that uses pg_stat_bgwriter snapshots to help
> model what the server is doing in these cases better you might find
> useful on the admin list, it's at
> http://archives.postgresql.org/pgsql-admin/2010-06/msg00074.php

thank you very much for your help!

best regards
Janning

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general