On my primary DB I'm observing random slowness which just doesn't make sense to me. The I/O system can easily do 40MB/sec writes, but I'm only seeing a sustained 5MB/sec, even as the application is stalling waiting on the DB. My only guess is that I'm getting hit by checkpoints too often, and this is causing some delays. I did a lot of reading and came across this most excelelent article by Greg Smith (http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm). Reading that and the Pg manual and looking at the statistics, here is what I think I need to do. (Greg, do you have a performance tuning book? If so, I really want to buy it! Your articles are awesome.) current settings for checkpoints: checkpoint_segments = 32 checkpoint_timeout = 900 checkpoint_completion_target = 0.9 log_checkpoints = on In my logs I see this pretty consistently during times of slowness: Oct 29 14:17:38 d04 postgres[54902]: [10990-1] LOG: checkpoint starting: xlog Oct 29 14:22:04 d04 postgres[54902]: [10991-1] LOG: checkpoint complete: wrote 52828 buffers (24.3%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=265.881 Oct 29 14:22:57 d04 postgres[54902]: [10992-1] LOG: checkpoint starting: xlog Oct 29 14:26:56 d04 postgres[54902]: [10993-1] LOG: checkpoint complete: wrote 52773 buffers (24.3%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=234.846 Oct 29 14:28:32 d04 postgres[54902]: [10994-1] LOG: checkpoint starting: xlog Oct 29 14:33:32 d04 postgres[54902]: [10995-1] LOG: checkpoint complete: wrote 53807 buffers (24.7%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=299.170 basically, the next checkpoint starts within a few seconds of the prior one completing. The stats show this: # select * from pg_stat_bgwriter; -[ RECORD 1 ]------+----------- checkpoints_timed | 3236 checkpoints_req | 83044 buffers_checkpoint | 1376460896 buffers_clean | 59124159 maxwritten_clean | 304410 buffers_backend | 285595787 buffers_alloc | 6643047623 Based on Greg's article and the above number showing that most checkpoints are triggered by running out of WAL segments, I should increase my checkpoint_buffers. Also, based on the article, I should increase the bgwriter_lru_maxpages (currently at default 100). My plan is to bump checkpoint_segments to 48 and reduce checkpoint_completion_target to 0.7, and bump the bgwriter_lru_maxpages to 500. Can the checkpoint operation actually cause the DB to stop responding for a few seconds at a time? That seems to be what I observe. Sometimes for 5 or more seconds one transaction will just stall. Thanks for any ideas. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general