Re: Slow Query / Check Point Segments

Greg Smith <greg@xxxxxxxxxxxxxxx> · Fri, 22 Jan 2010 01:21:59 -0500

Alex - wrote:

  checkpoint_segments = 32                # in logfile segments,
min 1, 16MB each
  checkpoint_timeout = 30min              # range 30s-1h

These parameters are not so interesting on their own.  The important
thing to check is how often checkpoints are happening, and how much
work each of them do.  Here's a useful section from your logs to
comment on:

  2010-01-22 12:21:48 JSTLOG:  checkpoint complete: wrote 83874
buffers (16.0%); 0 transaction log file(s) added, 0 removed, 32
recycled; write=138.040 s, sync=0.000 s, total=138.063 s
  2010-01-22 12:23:32 JSTLOG:  checkpoint complete: wrote 82856
buffers (15.8%); 0 transaction log file(s) added, 0 removed, 32
recycled; write=18.740 s, sync=0.000 s, total=18.783 s
  2010-01-22 12:24:26 JSTLOG:  checkpoint complete: wrote 75145
buffers (14.3%); 0 transaction log file(s) added, 0 removed, 32
recycled; write=12.129 s, sync=0.000 s, total=12.132 s
  2010-01-22 12:25:30 JSTLOG:  checkpoint complete: wrote 82108
buffers (15.7%); 0 transaction log file(s) added, 0 removed, 32
recycled; write=10.619 s, sync=0.000 s, total=10.621 s
  2010-01-22 12:28:03 JSTLOG:  checkpoint complete: wrote 87349
buffers (16.7%); 0 transaction log file(s) added, 0 removed, 32
recycled; write=82.190 s, sync=0.000 s, total=82.192 s

Here you're getting a checkpoint every minute or three, and each of
them is writing out ~80000 buffers = 625MB.  That is crazy, and no
wonder your queries are slow--the system is spending all of its time
doing constant, extremely expensive checkpoints.

You should re-tune this system until the checkpoints show up no more
often than every 5 minutes, and preferably closer to 10.  In your case,
that means greatly increasing checkpoint_segments..  And you might as
well decrease checkpoint_timeout, because right now the timeout is
rarely ever being reached before the system runs out of working
segments and executes a checkpoint for that reason.

I would suggest changing the parameters to these:

checkpoint_segments = 96                # in logfile segments, min
1, 16MB each
checkpoint_timeout = 10min              # range 30s-1h

And seeing how long the interval between checkpoints becomes under load
afterwards.  That should make it in the just >5 minute range.  I'd
expect that the number of buffers will only increase a bit, so instead
of dumping out 625MB every minute or three you should see maybe 800MB
every 5 minutes--big improvement.

Also:  just after making the change, save a copy of:

select now(),* from pg_stat_bgwriter

And then run that same query again a few hours after the change (and
maybe the next day twoo).  Subtract the two values to see how much they
changed, and then you'll find some useful info to compute the
checkpoint interval without even looking at the logs.  There's a lot of
theory and comments about this area at
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

  2010-01-22 12:36:08 JSTLOG:  checkpoint complete: wrote 80819
buffers (15.4%); 0 transaction log file(s) added, 0 removed, 32
recycled; write=51.476 s, sync=0.000 s, total=51.478 s
  2010-01-22 13:01:54 JSTLOG:  checkpoint complete: wrote 4892
buffers (0.9%); 0 transaction log file(s) added, 0 removed, 32
recycled; write=494.868 s, sync=0.000 s, total=494.982 s

See that last one?  That's a checkpoint that's being driven by time
expiring (30 minutes) rather than running out of segments.  That just
suggests that the system was mostly idle during that period.  Long
write times are perfectly fine here, the checkpoints are spread out
over a long time if possible in order to reduce average I/O.  You're
just not seeing that the rest of the time because checkpoints are
happening so often.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx  www.2ndQuadrant.com