checkpoint spikes

Janning <ml@xxxxxxxxxxxx> · Wed, 9 Jun 2010 18:53:23 +0200

Hi,

we currently encounter an increasing load on our website. With the increasing 
load we see some problems on our database. so we checked what happens and we 
saw spikes in our load when checkpoints are about to finish.

Our configuration:

max_connections = 125
ssl = false
shared_buffers = 500MB
work_mem = 15MB
maintenance_work_mem = 250MB
synchronous_commit = off
full_page_writes = off

checkpoint_segments = 10
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9

random_page_cost = 2
effective_cache_size = 5000MB

autovacuum = off

(we put autovacuum to off because we suspected it).

all the other parameters are set to default (beside log parameters and so on). 

Our machine has 12 GB RAM, i7-975 CPU and a SW-Raid-1 for datadir and another 
one for the rest of the server (including postgresql logfiles). Disk are 
"Barracuda 7200.11 SATA 3Gb/s 1.5-TB" and we are running debian lenny.

these are our checkpoint log statements:

2010-06-09 17:24:27 CEST [6375]: [1-1] LOG:  checkpoint starting: time
2010-06-09 17:28:09 CEST [6375]: [2-1] LOG:  checkpoint complete: wrote 2861 
buffers (4.5%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=193.057 s, sync=29.259 s, total=222.353 s
2010-06-09 17:34:27 CEST [6375]: [3-1] LOG:  checkpoint starting: time
2010-06-09 17:39:09 CEST [6375]: [4-1] LOG:  checkpoint complete: wrote 3247 
buffers (5.1%); 0 transaction log file(s) added, 0 removed, 2 recycled; 
write=255.255 s, sync=26.911 s, total=282.177 s
2010-06-09 17:44:27 CEST [6375]: [5-1] LOG:  checkpoint starting: time
2010-06-09 17:49:41 CEST [6375]: [6-1] LOG:  checkpoint complete: wrote 2746 
buffers (4.3%); 0 transaction log file(s) added, 0 removed, 2 recycled; 
write=280.743 s, sync=33.392 s, total=314.147 s
2010-06-09 17:54:27 CEST [6375]: [7-1] LOG:  checkpoint starting: time
2010-06-09 17:58:59 CEST [6375]: [8-1] LOG:  checkpoint complete: wrote 3118 
buffers (4.9%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=253.293 s, sync=18.585 s, total=271.892 s
2010-06-09 18:04:27 CEST [6375]: [9-1] LOG:  checkpoint starting: time
2010-06-09 18:08:46 CEST [6375]: [10-1] LOG:  checkpoint complete: wrote 2695 
buffers (4.2%); 0 transaction log file(s) added, 0 removed, 2 recycled; 
write=225.173 s, sync=33.789 s, total=258.972 s
2010-06-09 18:14:27 CEST [6375]: [11-1] LOG:  checkpoint starting: time
2010-06-09 18:18:30 CEST [6375]: [12-1] LOG:  checkpoint complete: wrote 2868 
buffers (4.5%); 0 transaction log file(s) added, 0 removed, 2 recycled; 
write=215.561 s, sync=27.701 s, total=243.271 s

What we saw is a rather long sync time. And exactly at this time, our 
responses become slow and the server load increases.

this is from pg_stat_bgwriter:

 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | 
maxwritten_clean | buffers_backend | buffers_alloc 
-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
              3495 |               0 |            9070242 |      10798927 |            
14421 |         6412707 |     208340755

We tried to increase checkpoint_timeout to 20 minutes but it was getting 
worse:

2010-06-09 18:34:27 CEST [6375]: [13-1] LOG:  checkpoint starting: time
2010-06-09 18:42:49 CEST [6375]: [14-1] LOG:  checkpoint complete: wrote 2956 
buffers (4.6%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=448.265 s, sync=54.087 s, total=502.377 s

here we see a 54 sec sync time and a much higher load on sync time compared to 
the 10 minutes checkpoint_timeout.

do you have any hints for us how to tune our configuration to avoid spikes?

kind regards
Janning

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general